1

I am currently working with data set of more than 100 columns, out of those 100 columns, the first four columns provide me with essential information like label, description, target, department. Apart from those four the other columns, provide me data values. There are some rows for those essential information who data values are null. I want to delete all the rows who's all data value is null.

So, basically what i did. I did a very long way. First, i split the whole table up into two tables. Where df1 stored my essential information (Label, description, target, department) and df2 stored my data values. Now for df2, i did the isnull() approach and find out which index gave me the null value. I noted down the index, and concated the two table. After concating, i basically dropped the rows according to the index i noted down.

df1 = pd.read_excel('***.xlsx',skiprows = 5)

df2 = df1.iloc[:,4:]

df2[df2.isnull().all(axis=1)] (*Used this to note down the index of null value rows*)

df1.drop(df1.iloc[:,4:],axis=1,inplace = True) (*Used this to get rid of the data value columns and only leave behind the essential information columns*)

new_df = pd.concat([df1,df2],axis = 1)

new_df.drop(new_df.index[[430,431,432]],inplace = True)

The following approach did do the justice. However, i getting a feeling its very long way,so i was wondering if there is any shorter approach to it?. Really appreciate your help

gmds
  • 19,325
  • 4
  • 32
  • 58
krijan
  • 117
  • 8

1 Answers1

2

If I understand correctly, you are looking for dropna:

df1.dropna(how='all', subset=df1.columns[4:])

This specifies that you should drop only rows that have all null values from the 4th column onward.

EDIT: Since you actually want to drop rows where all the values are 0, you should do this instead:

df1 = df1[~(df1.iloc[:, 4:] == 0).all(axis=1)]
gmds
  • 19,325
  • 4
  • 32
  • 58
  • Hello, i did try it out, the code did compile. However, it didnt really manage to drop it. – krijan May 24 '19 at 02:14
  • @krijan Did you assign it to another variable? This does not modify `df1` inplace. – gmds May 24 '19 at 02:15
  • See the reason why it didn't work, is due to the value being 0. My mistake i said null not 0. Its supposed to be 0. So is there any way to delete all the rows that have all value of 0. ? – krijan May 24 '19 at 02:19
  • Because, i forgot to say that i changed all the NaN value with 0. im really sorry for confusion – krijan May 24 '19 at 02:21
  • @krijan See my edit. – gmds May 24 '19 at 02:22
  • Yess !!! its working :). You are the best. Thank you very much. By the way what is the importance of adding '~', i can't even find this key on my keyboard – krijan May 24 '19 at 02:25
  • 1
    `~` means "not". In this case, `(df1.iloc[:, 4:] == 0).all(axis=1)` gives all the rows which have all zeroes, but we want to keep the rest, so we invert that condition with `~`. – gmds May 24 '19 at 02:34