1

I am trying to drop all rows from dataframe where any entry in any column of the row has the value zero. I am placing a Minimal Working Example below

import pandas as pd
df = pd.read_excel('trial.xlsx',sheet_name=None)
df

I am getting the dataframe as follows

OrderedDict([('Sheet1',   type  query  answers
          0  abc    100       90
          1  def      0        0
          2  ghi      0        0
          3  jkl      5        1
          4  mno      1        1)])

I am trying to remove the rows using the dropna() using the following code.

df = df.dropna()
df

i am getting an error saying 'collections.OrderedDict' object has no attribute 'dropna''. I tried going through the various answers provided here and here, but the error remains. Any help would be greatly appreciated!

Krishn Nand
  • 135
  • 1
  • 1
  • 9
  • Please [create a reproducible copy of the DataFrame with `df.head(10).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard), [edit] the question, and paste the clipboard into a code block or include synthetic data: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Trenton McKinney Jun 30 '20 at 07:20
  • 1
    `import numpy as np`, `df.replace(0, np.nan, inplace=True)`, and then `df.dropna(inplace=True)` – Trenton McKinney Jun 30 '20 at 07:22
  • Thank you for replying @Trenton McKinney. I have edited my question. But even when i use df.head(10), i am getting the same type of error - 'collections.OrderedDict' object has no attribute 'head' – Krishn Nand Jun 30 '20 at 07:39
  • 1
    An `OrderedDict` isn't a dataframe. If you do `df = pd.read_excel('trial.xlsx',sheet_name=None)`, then do `df.head()` what happens? – Trenton McKinney Jun 30 '20 at 07:49
  • 1
    Unfortunately your variable ```df``` is not referring to a dataframe. You can check what datatype your variable is by passing the object to the ```type``` function like so ```type(df)```. You can create a dataframe from your OrderedDict like this ```df = pandas.DataFrame(my_ordered_dict)``` – el_oso Jun 30 '20 at 07:56
  • Thank you @ Trenton McKinney. I get the same error as earlier like _collections.OrderedDict object has no attribute 'head'_, upon running `df.head()`. You are right, `df` is a OrderedDict and not a DataFrame – Krishn Nand Jun 30 '20 at 09:46
  • Thank you @el_oso. I tried out `df = pandas.DataFrame(my_ordered_dict)`, but giot the following error _ValueError: If using all scalar values, you must pass an index_ – Krishn Nand Jun 30 '20 at 09:49
  • I got around to getting a dataframe by saving my file as a .csv instead of .xlsx. However `df1` in the code given below also has rows with zero in them, `import pandas as pd import numpy as np df = pd.read_csv('trial.csv') print(df) df1 = df.loc[(df!=0).any(1)] print(df1)` – Krishn Nand Jun 30 '20 at 10:17

1 Answers1

2

The reason why you are getting an OrderedDict object is because you are feeding sheet_name=None parameter to the read_excel method of the library. This will load all the sheets into a dictionary of DataFrames.

If you only need the one sheet, specify it in the sheet_name parameter, otherwise remove it to read the first sheet.

import pandas as pd
df = pd.read_excel('trial.xlsx') #without sheet_name will read first sheet

print(type(df))
df = df.dropna()

or

import pandas as pd
df = pd.read_excel('trial.xlsx', sheet_name='Sheet1') #reads specific sheet

print(type(df))
df = df.dropna()
el_oso
  • 1,021
  • 6
  • 10
  • Wow!! Yes it worked. Now i am able to get the DataFrame from .xlsx also. I tried `df = df.replace(0, np.nan)` `df = df.dropna(how='all', axis=0)` `df1=df.dropna()` and got the rows with zeroes removed!! Thank you!! – Krishn Nand Jun 30 '20 at 11:18