I import data from a CSV where I am replacing the empty fields with an 'EMPTYFIELD' value.
pd.read_csv('myFile.csv', usecols=['AAA', 'BBB', 'CCC'])
df = df.fillna('EMPTYFIELD')
I am trying to create a dataframe that will have all the rows that contain an 'EMPTYFIELD' value. That implies that at least one column contains this value. I used the following and it works off course:
error = df[df.AAA.str.contains('EMPTYFIELD')]
error = error[error.BBB.str.contains('EMPTYFIELD')]
error = error[error.CCC.str.contains('EMPTYFIELD')]
Now, I am trying to reduct the lines in my code. So, I was thinking of using a lambda instead without referencing to the columns (ideal):
error2 = df.apply(lambda x: 'EMPTYFIELD' if 'EMPTYFIELD' in x else x)
#error2 = df.apply(lambda x : any([ isinstance(e, 'EMPTYFIELD') for e in x ]), axis=1)
and then I tried referencing the columns too:
error2 = df[usecols].apply(lambda x: 'EMPTYFIELD' if 'EMPTYFIELD' in x else x)
and
error2 = df[df[usecols].isin(['EMPTYFIELD'])]
None of the above work. I print the results in a new CSV file. I can see all the rows even if they contain the 'EMPTYFIELD' value.
UPD: This is my extended code. Some of the answers return an error possible because of the lines below:
varA = 'AAA';
dfGrouped = df.groupby(varA, as_index=False).agg({'Start Date': 'min', 'End Date': 'max'}).copy()
varsToKeep = ['AAA', 'BBB', 'CCC', 'Start Date_grp', 'End Date_grp' ]
dfTemp = pd.merge(df, dfGrouped, how='inner', on='AAA', suffixes=(' ', '_grp'), copy=True)[varsToKeep]
errors = dfTemp[~np.logical_or.reduce([dfTemp[varsToKeep].str.contains('EMPTYFIELD') for varsToKeep in dfTemp])]