0

I have a csv file that gets mailed to me every day and I want to write a script to clean up the data before I push it in a database. At the bottom of the csv file are 2 empty rows (Row 73 & 74 in image) and two rows with some junk data in them (Row 75 & 76 in image) and I need to delete these rows.

enter image description here

To identify the first empty row, it might be helpful to know that Column A will always have data in it until the first empty row (Row 73 in image).

Can you help me figure out how to identify these rows and delete the data in them?

Abhay
  • 827
  • 9
  • 34
  • df.dropna() drops rows with NaNs, If you have NaNs in each row to drop, you can use this. This also might help: https://stackoverflow.com/questions/13851535/delete-rows-from-a-pandas-dataframe-based-on-a-conditional-expression-involving – Petronella Dec 19 '19 at 15:12
  • [Please don't post images of code/data (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) – anky Dec 19 '19 at 15:12

1 Answers1

5

You can check misisng values by Series.isna, create cumulative sum by Series.cumsum and filter only if equal 0 by boolean indexing. Also this solution working if no missing value in first column.

df = pd.DataFrame({'A':['as','bf', np.nan, 'vd', 'ss'],
                   'B':[1,2,3,4,5]})

print (df)
     A  B
0   as  1
1   bf  2
2  NaN  3
3   vd  4
4   ss  5

df = df[df['A'].isna().cumsum() == 0]
print (df)
    A  B
0  as  1
1  bf  2
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252