0

I have raw data. I want to split this into csv/excel. after that if the data in the rows are not correctly stored( for e.g. if 0 is there entered instead of 121324) I want python to identify those rows. I mean while splitting raw data into csv through python code, some rows might form incorrectly( please understand). How to identify those rows through python?

example: S.11* N. ENGLAND L -8' 21-23 u44'\n S.18 TAMPA BAY W -7 40-7 u49'\n S.25 Buffalo L -4' 18-33 o48

result i want: S,11,*,N.,ENGLAND,L,-8',21-23,u44'\n S,18,,TAMPA,BAY,W,-7,40-7,u49'\n S,25,,Buffalo,L,-4',18-33,o48\n

suppose the output is like this: S,11,N.,ENGLAND,L,-8',21-23u,44'\n S,18,,TAMPA,BAY,W,-7,40-7,u49'\n S,25,,Buffalo,L,-4',18-33,o48\n

you can see that in first row * is missing and u44' is stored as only 44. and u is append with another column.

this row should be identified by python code and should return me this row.

likewise i want all rows those with error.

this is what i have done so far.

import csv

input_filename = 'rawsample.txt'
output_filename = 'spreads.csv'

with open(input_filename, 'r', newline='') as infile:
     open(output_filename, 'w', newline='') as outfile:
    reader = csv.reader(infile, delimiter=' ', skipinitialspace=True)
    writer = csv.writer(outfile, delimiter=',')
    for row in reader:
        new_cols = row[0].split('.')
        if not new_cols[1].endswith('*'):
            new_cols.extend([''])
        else:
            new_cols[1] = new_cols[1][:-1]
            new_cols.extend(['*'])
        row = new_cols + row[1:]
        #print(row)
        writer.writerow(row)
er=[]
for index, row in df.iterrows():
    for i in row:
        if str(i).lower()=='nan' or i=='':
            er.append(row)
# i was able to check for null values but nothing more.

please help me.

  • 2
    "*please understand*" -> unfortunately, we can't **guess** what you mean without a clear reproducible example of your data, the exact expected output, the exact error that you face and your current code. – mozway Nov 29 '22 at 09:48
  • Try to be in our shoes, read the question from our perspective. We don't have a clue on where to start. You should problem solve step by step. Starting by what is your raw data like? (format, contents), how to copy them over to a excel, csv file, then how to identify bad data within the newly create file, and how to clean them. – Sin Han Jinn Nov 29 '22 at 09:58

1 Answers1

0

@mozway is right you better give an example input and expected result.

Anyway if you're dealing with a variable number of columns in the input please refer to Handling Variable Number of Columns with Pandas - Python

Best