4

I am using this dataset and reading it through pandas dataframe. I need to work with the paperAbsrtract column only which has some missing data.

filename = "sample-S2-records"
df = pd.read_json(filename, lines=True) 
abstract = df['paperAbstract']

Because there are some missing data in the abstract dataframe, I want to remove those rows that are empty. So following the documentation, I do below

abstract.dropna(how='all')

But this doesn't remove those empty rows. They are still there in the abstract dataframe. What am I missing?

nad
  • 2,640
  • 11
  • 55
  • 96
  • 2
    We’ll dropna will only recognize values pandas considers null. If by empty you mean the empty string, that doesn’t count. Can you show some of your data, preferably from `df.head().to_dict()` – ALollz Oct 04 '18 at 21:54
  • @ALollz yes you are right. It is actually empty string. So how do I solve it without manually parsing the dictionary. – nad Oct 05 '18 at 04:19
  • 2
    You need to first replace the empty strings with `NaN`, `abstract.replace('', np.NaN).dropna(how='all')`. Alternatively, you could check where everything is equal to `''`, but I'm unsure if you have a `DataFrame` or series, and over what axis you would want such to be done. – ALollz Oct 05 '18 at 13:20
  • 1
    @ALollz thanks this solves the issue. If you submit it as an answer, I can accept it. – nad Oct 05 '18 at 21:58
  • Does this answer your question? [Drop rows containing empty cells from a pandas DataFrame](https://stackoverflow.com/questions/29314033/drop-rows-containing-empty-cells-from-a-pandas-dataframe) – Gonçalo Peres Jan 08 '21 at 09:37

1 Answers1

4

You are missing the inplace argument by setting it to True or assigning this function's result to your dataframe.

# Solution 1: inplace = True:

abstract.dropna(how='all', inplace = True) 
# do operation inplace your dataframe and return None.

# Solution 2: assign the function result to your own dataframe:

abstract = abstract.dropna(how='all') 
# don't do operation inplace and return a dataframe as a result. 
# Hence this result must be assigned to your dataframe

Note: inplace default value is False.

David Beauchemin
  • 231
  • 1
  • 2
  • 12
Rodolfo Bugarin
  • 605
  • 6
  • 12