2

What I want is to skip the first 20 rows since the data I need is from row 21 and below. I already tried 'skiprows' but the lines before the header is changing for every file. So I want it to be flexible for any file. How do I do that?

My idea at first is to increment a variable to know how many skips needed:

skip = 0
if 'X error' not in pd.read_csv(nF):
    skip += 1

But it shows an 'Error tokenizing data. C error: Expected 1 fields in line 13, saw 10'.

The CSV:

    <INFO>
{
InspectionResultFileType:1.01-FULL-ENG
InspectMode:2
Unit:0
ReviseBalance:1
JudgeItem:448
TeachingMethod:4
ReviseMode:0
ReviseScalingX:1.000013
ReviseScalingY:0.999969
}
Insp ON/OFF,T code,Design D,X error -,X error +,Y error -,Y error +,D error -,D error +,DD error
1,T1,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T2,0.151,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T3,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T4,0.152,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T5,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T6,0.251,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
1,T7,2.000,-0.06000,0.06000,-0.06000,0.06000,-0.06000,0.06000,0.06000
NO.,T code,H. NO.,Jud,Design X,Design Y,Design D,Measu. X,Measu. Y,Measu. D,X error,Y error,D error,DD,TimeStamp
Untargeted
  • 152
  • 1
  • 11

2 Answers2

2
skiprows = 0
with open(filename, 'r+') as f:
    for line in f:
        if not line.startswith('NO.'):
            skiprows += 1
        else:
            break

print(skiprows)

Found this solution in this question.

Untargeted
  • 152
  • 1
  • 11
1

Do something like this to get the list of items from the file.

arr = []
with open('xyz.csv') as f:
    for line in f:
        x = line.strip('\n').split(',')
        if len(x) > 1:
            arr.append(x)
print (arr)

The output of this was as follows:

[['Insp ON/OFF', 'T code', 'Design D', 'X error -', 'X error +', 'Y error -', 'Y error +', 'D error -', 'D error +', 'DD error'], ['1', 'T1', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T2', '0.151', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T3', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T4', '0.152', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T5', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T6', '0.251', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['1', 'T7', '2.000', '-0.06000', '0.06000', '-0.06000', '0.06000', '-0.06000', '0.06000', '0.06000'], ['NO.', 'T code', 'H. NO.', 'Jud', 'Design X', 'Design Y', 'Design D', 'Measu. X', 'Measu. Y', 'Measu. D', 'X error', 'Y error', 'D error', 'DD', 'TimeStamp']]

Looks like the last row has 15 columns and is not aligned to how the data is stored in the file.

If you want to convert this data into a DataFrame, you can write a few additional lines of code:

import pandas as pd
df = pd.DataFrame(data=arr[1:-1], columns = arr[0])
print (df)

I am rows lines 1 thru last but 1 as data. And using first row as column headers.

The output of this will look like this:

  Insp ON/OFF T code Design D X error -  ... Y error + D error - D error + DD error
0           1     T1    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
1           1     T2    0.151  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
2           1     T3    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
3           1     T4    0.152  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
4           1     T5    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
5           1     T6    0.251  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000
6           1     T7    2.000  -0.06000  ...   0.06000  -0.06000   0.06000  0.06000

[7 rows x 10 columns]
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33