-1

I have a DataFrame with two columns 'A' and 'B'. My goal is to delete rows where 'B' is empty. Others have recommended to use df[pd.notnull(df['B'])]. For example here: Python: How to drop a row whose particular column is empty/NaN?

However, somehow this does not work in this case. Why not and how to solve this?

    A          B
0   Lorema     Ipsuma
1   Corpusa    Dominusa
2   Loremb     
3   Corpusc    Dominusc
4   Loremd     
5   Corpuse    Dominuse

This is the desired result:

    A          B
0   Lorema     Ipsuma
1   Corpusa    Dominusa
2   Corpusc    Dominusc
3   Corpuse    Dominuse
twhale
  • 725
  • 2
  • 9
  • 25
  • 1
    df.loc[df.ne('').all(1),:] – BENY Jul 04 '18 at 21:11
  • 1
    You have to understand what values are in your data frames. For example, if it is an empty string, `df[df.B != ""]` would do. If it is `None` or `NA`, then `notnull()` shoud lwork etc – rafaelc Jul 04 '18 at 21:12
  • Hint: try `print(type(df['B'].iloc[2]))` with your dataframe above to see what type you have. – jpp Jul 04 '18 at 21:14
  • @jpp: Thanks. I have done that and got: `` – twhale Jul 04 '18 at 21:15
  • So you should try @Wen's solution.. looks like you have empty strings. – jpp Jul 04 '18 at 21:16
  • @jpp: Strangely enough, Wen's solution does not work either. It leaves the original DataFrame unaltered. – twhale Jul 04 '18 at 21:18
  • @user3483203, I don't think that's true. You can use `df.loc` to drop rows via a Boolean series, no need for `drop`. I suggest OP do some more debugging themselves, e.g. try `df['B'].iloc[2] == ''`. This is basic debugging which should be learnt when using Python (or any language). – jpp Jul 04 '18 at 21:21
  • @user3483203: `df.drop(df.loc[df.eq('').any(1)].index)` leaves the original DataFrame unaltered. – twhale Jul 04 '18 at 21:23
  • @twhale You have to check *what* is it in those cells. It could be whitespace(s), tabs, etc.. – rafaelc Jul 04 '18 at 21:25

1 Answers1

0

Basically, you could have whitespaces, tabs or even a \n in these blank cells.

For all those cases, you can strip values first, and then remove the rows, i.e.

df[df.B.str.strip().ne("") & df.B.notnull()]

I believe this should cover all cases.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • The empty cells are cells that have never been filled after the column was created. This solution does not yet do the trick. Would it be possible to fill those cells with something (e.g. '0') and then drop? – twhale Jul 04 '18 at 21:35
  • Can you post how you are creating this data frame? If a csv file, how you post a sample csv? – rafaelc Jul 04 '18 at 21:36
  • My function is complicated. But it either returns a string to the cell or it returns a white space ''. (So I expected your solution to work.) And then writes the result to a csv file. – twhale Jul 04 '18 at 21:43
  • Following jpps' suggestion, what does `df['B'].iloc[2] == ''` print ? – rafaelc Jul 04 '18 at 21:45
  • It prints `False`. – twhale Jul 04 '18 at 21:46
  • Awesome. So do it untill you find out what the hell is in that cell :) (i.e. until something returns `True`) – rafaelc Jul 04 '18 at 21:48