Is there a way to use list comprehension to create a list of tuples with two different conditions.
I am interacting through a Pandas DF and I want to return an entire row in tuple if it matches either condition. The first is if the DF has nan values in any column.
The other is if a column in the DF called ODFS_FILE_CREATE_DATETIME
doesn't match the regex pattern for the date column. The date column is supposed to have an output that looks like this: 2005242132
. 10 number digits. So if the df returns something like 2004dg, it should be picked up as an error and the row should be added to my list of tuples
My sad pathetic attempt:
[tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]
Full Function that contains the two seperate list of tuples:
def process_csv_formatting(csv):
odfscsv_df = pd.read_csv(csv, header=None,names=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
odfscsv_df['CSV_FILENAME'] = csv.name
odfscdate_re = re.compile(r"\d{10}")
#print(odfscsv_df)
#odfscsv_df = odfscsv_df.replace('', np.nan)
errortup = [(odfsname, "Bad_ODFS_FILE_CREATE_DATETIME= " + str(cdatetime), csv.name) for odfsname,cdatetime in zip(odfscsv_df['ODFS_LOG_FILENAME'], odfscsv_df['ODFS_FILE_CREATE_DATETIME']) if not odfscdate_re.search(str(cdatetime))]
emptypdf = pd.DataFrame(columns=['ODFS_LOG_FILENAME', 'ODFS_FILE_CREATE_DATETIME', 'LOT', 'TESTER', 'WAFER_SCRIBE'])
print([tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values])
[tuple(x) for x in odfscsv_df[odfscsv_df.isna().any(1)].values or x in odfscdate_re.search(str(odfscsv_df['ODFS_FILE_CREATE_DATETIME'])) ]
#print(odfscsv_df[(odfscsv_df[column_name].notnull()) & (odfscsv_df[column_name] != u'')].index)
for index, row in odfscsv_df.iterrows():
#print((row['WAFER_SCRIBE']))
print((row['ODFS_FILE_CREATE_DATETIME']))
#errortup = [x for x in odfscsv_df['ODFS_FILE_CREATE_DATETIME']]
if len(errortup) != 0:
#print(errortup) #put this in log file statement somehow
#print(errortup[0][2])
return emptypdf
else:
return odfscsv_df
Sample CSV Data. The commas delienate the cells:
2005091432_943SK1J.00J.SK1J-23.FPD.FMGN520.Jx6D36ny5EO53qAtX4.log,,W943SK10,MGN520,0Z0RK072TCD2
2005230137_014SF1J.00J.SF1J-23.WCPC.FMGN520.XlwHcgyP5eFCpZm5cf.log,,W014SF10,MGN520,DM4MU129SEC1
2005240909_001914J.E0J.914J-15.WRO3PC.FMGN520.nZKn7OvjGKw1i4pxiu.log,,K001914E,MGN520,DM3FZ226SEE3
2005242132_001914J.E0J.914J-15.WRO4PC.FMGN520.V8dcLhEgygRj2rP2Df.log,2005242132,K001914E,MGN520,DM3FZ226SEE3
2005251037_001914J.E0J.914J-15.WRO4PC.FMGN520.dyixmQ5r4SvbDFkivY.log,2005251037,K001914E,MGN520,DM3FZ226SEE3
2005251215_949949J.E0J.949J-21.WRO2PP.FMGN520.yp1i4e7a7D1ighkdB7.log,2005251215,K949949E,MGN520,DG2KV122SEF6
2005251231_949949J.E0J.949J-25.WRO2PP.FMGN520.oLQGhc2whAlhC3dSuR.log,2005251231,K949949E,MGN520,DG2KV333SEF3
2005260105_001914J.E0J.914J-15.WRO4PC.FMGN520.wOQMUOfZgkQK9iHJS5.log,2005260105,K001914E,MGN520,DM3FZ226SEE3
2006111130_950909J.00J.909J-22.FPC.FMGN520.UuqeGtw9xP6lLDUW9N.log,2006111130,K9509090,MGN520,DG7LW031SEE7
2006111612_950909J.00J.909J-22.FPC.FMGN520.hoDl3QSNPKhcs4oA2N.log,2006111612,K9509090,MGN520,DG7LW031SEE7
2006120638_006914J.E0J.914J-15.CZPC.FMGN520.qCgFUH2H21ieT641i9.log,2006120638,K006914E,MGN520,DM8KJ568SEC3
2006122226_006914J.E0J.914J-15.CZPC.FMGN520.nSHSp7klxjrQlVTcCu.log,2006122226,K006914E,MGN520,DM8KJ568SEC3
2006130919_006914J.E0J.914J-15.CZPC.FMGN520.Zd6DrMUsCjuEVBFwvn.log,2006130919,K006914E,MGN520,DM8KJ568SEC3
2006140457_007911J.E0J.911J-25.RDR2PC.FMGN520.QPX9r59TnXObXyfibv.log,2006140457,K007911E,MGN520,DN4AU351SED1
2006141722_007911J.E0J.911J-25.WCPC.FMGN520.dNQLkvQlPTplEjJspB.log,2006141722,K007911E,MGN520,DN4AU351SED1
2006160332_007911J.E0J.911J-25.WCPC.FMGN520.DQiH82Ze9fCoaLVbDE.log,2006160332,K007911E,MGN520,DN4AU351SED1
2006170539_007911J.E0J.911J-25.WCPC.FMGN520.TjakhXkmhmlGhfLheo.log,2006170539,K007911E,MGN520,DN4AU351SED1