I have raw data. I want to split this into csv/excel. after that if the data in the rows are not correctly stored( for e.g. if 0 is there entered instead of 121324) I want python to identify those rows. I mean while splitting raw data into csv through python code, some rows might form incorrectly( please understand). How to identify those rows through python?
example: S.11* N. ENGLAND L -8' 21-23 u44'\n S.18 TAMPA BAY W -7 40-7 u49'\n S.25 Buffalo L -4' 18-33 o48
result i want: S,11,*,N.,ENGLAND,L,-8',21-23,u44'\n S,18,,TAMPA,BAY,W,-7,40-7,u49'\n S,25,,Buffalo,L,-4',18-33,o48\n
suppose the output is like this: S,11,N.,ENGLAND,L,-8',21-23u,44'\n S,18,,TAMPA,BAY,W,-7,40-7,u49'\n S,25,,Buffalo,L,-4',18-33,o48\n
you can see that in first row * is missing and u44' is stored as only 44. and u is append with another column.
this row should be identified by python code and should return me this row.
likewise i want all rows those with error.
this is what i have done so far.
import csv
input_filename = 'rawsample.txt'
output_filename = 'spreads.csv'
with open(input_filename, 'r', newline='') as infile:
open(output_filename, 'w', newline='') as outfile:
reader = csv.reader(infile, delimiter=' ', skipinitialspace=True)
writer = csv.writer(outfile, delimiter=',')
for row in reader:
new_cols = row[0].split('.')
if not new_cols[1].endswith('*'):
new_cols.extend([''])
else:
new_cols[1] = new_cols[1][:-1]
new_cols.extend(['*'])
row = new_cols + row[1:]
#print(row)
writer.writerow(row)
er=[]
for index, row in df.iterrows():
for i in row:
if str(i).lower()=='nan' or i=='':
er.append(row)
# i was able to check for null values but nothing more.
please help me.