How to extract data from messed-up CSV?

Question

how to extract data

Sample_File,C:\app\ok,,,,,,,,,,,,,,,,
Sample Time,20,,,,,,,,,,,,,,,,
Density,1,,,,,,,,,,,,,,,,
Stokes,off,,,,,,,,,,,,,,,,
Lower,0.486,,,,,,,,,,,,,,,,
Upper ,20.53,,,,,,,,,,,,,,,,
Sample #,75,,,,,,,,,,,,,,,,
Date,1/30/2019,,,,,,,,,,,,,,,,
Start Time,8:59:44,,,,,,,,,,,,,,,,
Correlate ,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,conts total
<0.523,0,3,1,0,0,4,9,2,0,0,0,0,0,0,0,0,19
0.542,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3
1.037,0,0,4,8,2,1,1,2,0,0,0,0,0,0,0,0,18
1.114,0,1,5,16,3,0,0,0,0,0,0,0,0,0,0,0,25
1.197,0,0,2,11,7,2,1,2,0,0,0,0,0,0,0,0,25
2.129,0,15,49,21,150,401,4,13,8,0,0,0,0,0,0,0,661
2.288,0,15,68,53,201,795,18,13,3,0,0,0,0,0,0,0,1166
2.458,0,9,72,99,238,1533,15,32,6,0,0,0,0,0,0,0,2004
3.786,0,0,0,0,85,10054,1303,333,41,0,0,0,0,0,0,0,11816
4.068,0,0,0,1,33,8310,1504,422,38,0,1,0,0,0,0,0,10309
Diameter,Raw Counts,,,,,,,,,,,,,,,,
<0.523,19,,,,,,,,,,,,,,,,
0.542,3,,,,,,,,,,,,,,,,
0.583,4,,,,,,,,,,,,,,,,
0.626,4,,,,,,,,,,,,,,,,
0.673,9,,,,,,,,,,,,,,,,
Side,Raw Counts,,,,,,,,,,,,,,,,
1,0,,,,,,,,,,,,,,,,
2,129,,,,,,,,,,,,,,,,
3,361,,,,,,,,,,,,,,,,
Event 1,971,,,,,,,,,,,,,,,,
Event 3,7091,,,,,,,,,,,,,,,,
Event 4,1,,,,,,,,,,,,,,,,
Dead Time,448,,,,,,,,,,,,,,,,
pressure,1006,,,,,,,,,,,,,,,,

I used

Aps_data = pd.read_csv("test.csv")

getting error:utf-8' codec can't decode byte 0xb5 in position 7: invalid start byte

In what way is this file "messed up"? Have you tried using the built-in `csv` module? In what way did it fail? Some rows seem to be their own single values instead of tabular data. What is your expected result? Please read [ask]. — ChrisGPT was on strike, Feb 17 '19 at 12:57
You are not showing the relevant part of the file. The error says that the file contains a byte `b'\xb5'` in position 7. As 0xb5 is the unicode code point of `'µ'`, could the file contain a `'µ'` character? If it does not, could you provide an hexadecimal dump of the beginning of the file? You can use `print([hex(i) for i in open('test.csv', 'rb').read(64)])` to get it. — Serge Ballesta, Feb 17 '19 at 13:51

Thomas Gak-Deluen · Answer 1 · 2019-02-17T15:46:29.907

0

Simply with the csv module

import csv

with open('mycsv.csv', 'r') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row) # print whole row
        print(row[0]) # print first column

Edit1: replaced rb with r when opening file so that it works in both Python2 and Python3.

edited Feb 17 '19 at 15:46

answered Feb 17 '19 at 13:04

Thomas Gak-Deluen

2,759
2
28
38

Error Traceback (most recent call last) in 3 with open('test.csv', 'rb') as f: 4 reader = csv.reader(f) ----> 5 for row in reader: 6 print(row) 7 print(row[0]) Error: iterator should return strings, not bytes (did you open the file in text mode?) – Shikha Mishra Feb 17 '19 at 13:11

score 0 · Answer 2 · answered Feb 17 '19 at 13:34

Reading of your data (copy & paste) works just fine. Please mind you should specify encoding while using pandas.read_csv; check e.g. this answer and consult docs.

Reading your data into Dataframe:

import pandas as pd
from io import StringIO

s = """
Sample_File,C:\app\ok,,,,,,,,,,,,,,,,
Sample Time,20,,,,,,,,,,,,,,,,
Density,1,,,,,,,,,,,,,,,,
Stokes,off,,,,,,,,,,,,,,,,
Lower,0.486,,,,,,,,,,,,,,,,
Upper ,20.53,,,,,,,,,,,,,,,,
Sample #,75,,,,,,,,,,,,,,,,
Date,1/30/2019,,,,,,,,,,,,,,,,
Start Time,8:59:44,,,,,,,,,,,,,,,,
Correlate ,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,conts total
<0.523,0,3,1,0,0,4,9,2,0,0,0,0,0,0,0,0,19
0.542,0,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3
1.037,0,0,4,8,2,1,1,2,0,0,0,0,0,0,0,0,18
1.114,0,1,5,16,3,0,0,0,0,0,0,0,0,0,0,0,25
1.197,0,0,2,11,7,2,1,2,0,0,0,0,0,0,0,0,25
2.129,0,15,49,21,150,401,4,13,8,0,0,0,0,0,0,0,661
2.288,0,15,68,53,201,795,18,13,3,0,0,0,0,0,0,0,1166
2.458,0,9,72,99,238,1533,15,32,6,0,0,0,0,0,0,0,2004
3.786,0,0,0,0,85,10054,1303,333,41,0,0,0,0,0,0,0,11816
4.068,0,0,0,1,33,8310,1504,422,38,0,1,0,0,0,0,0,10309
Diameter,Raw Counts,,,,,,,,,,,,,,,,
<0.523,19,,,,,,,,,,,,,,,,
0.542,3,,,,,,,,,,,,,,,,
0.583,4,,,,,,,,,,,,,,,,
0.626,4,,,,,,,,,,,,,,,,
0.673,9,,,,,,,,,,,,,,,,
Side,Raw Counts,,,,,,,,,,,,,,,,
1,0,,,,,,,,,,,,,,,,
2,129,,,,,,,,,,,,,,,,
3,361,,,,,,,,,,,,,,,,
Event 1,971,,,,,,,,,,,,,,,,
Event 3,7091,,,,,,,,,,,,,,,,
Event 4,1,,,,,,,,,,,,,,,,
Dead Time,448,,,,,,,,,,,,,,,,
pressure,1006,,,,,,,,,,,,,,,,
"""

df = pd.read_csv(StringIO(s))

How to extract data from messed-up CSV?

2 Answers2