0
  1. I have a pandas DataFrame that was created from some raw data, there are hundreds of lines so I will just show the first 10 rows.

  2.        text
    
     0       0
     1       0
     2       0
     3       0
     4  26.529
     5       0
     6  25.558
     7       0
     8       0
     9       0
    
  3. I want to get rid of all the zeros in my data frame and replace the column name from 'text' to 'Results', so the final data should look like this:

  4.       Results
    
     0    26.529
     1    25.558
    
  5. My method was to use the df.drop() method to drop all rows containing zeros. My code looks like this:

     df = df.drop(df[df['text'] == 0].index,inplace=True)
    
     # I didn't write the code to replace to column name yet
    
  6. Somehow when I run this, the resulting df is empty/ nonetype. I have no idea why the drop method just dropped everything in my dataframe. Please help! Much appreciated in advance!

  7. When I debug the code in debug mode (vs code), I see the values in my df are as follows:

     I noticed that every element in my df is an object type. I want to get rid of all the arrays with an empty object. Ex. "000:array([''],dtype=object)"
    
    
     [1]: https://i.stack.imgur.com/yk63P.png
    
CYU1
  • 41
  • 5
  • 1
    Btw, the index values shown on the left are not actual values in the data frame. – CYU1 Feb 03 '22 at 20:41
  • 1
    You don't need to `df.drop()`, just use boolean masking itself: `df = df[df["text"] != 0]` – ddejohn Feb 03 '22 at 20:47
  • Hi, I added a screenshot of the variable in my df above. You can copy and paste the image link into your browser to see what I got. Thanks. – CYU1 Feb 17 '22 at 16:28
  • The reason you're getting an empty dataframe is because you're assigning an in-place operation. You can only *either* use `inplace=True` OR reassign the result via `df = ...`. You cannot do both, because when `inplace=True`, the operation modifies the original data and returns `None` (think of trying to do `my_list = my_list.append(3)`), which you are then assigning to `df`. – ddejohn Feb 17 '22 at 17:01
  • https://stackoverflow.com/questions/43893457/understanding-inplace-true-in-pandas – ddejohn Feb 17 '22 at 17:04

2 Answers2

1

You can do that with the following

df[df["text"].str.strip()!="0"].rename(columns={'text':'Results'}).reset_index(drop=True)

BoomBoxBoy
  • 1,770
  • 1
  • 5
  • 23
  • Hi, this doesn't work for me. I tried and the results still contain zeros. – CYU1 Feb 17 '22 at 16:09
  • What is the output when you run `df["text"].dtype`. I suspect there could be spaces around the 0's? – BoomBoxBoy Feb 17 '22 at 16:13
  • They are objects. For example, when I ran in debugger mode with setting a breakpoint after the df was created. I can see in the df variables contains something like this "000:array([''],dtype=object". – CYU1 Feb 17 '22 at 16:25
  • 1
    Hi, I added a screenshot of the variable in my df above. You can copy and paste the image link into your browser to see what I got. Thanks. – CYU1 Feb 17 '22 at 16:28
  • I edited my answer to strip spaces, let me know how it goes – BoomBoxBoy Feb 17 '22 at 16:30
  • Hi, I just tried. Unfortunately, the results are still the same. Zeros are still in the df. – CYU1 Feb 17 '22 at 16:34
  • Im not sure what else it could be.. Is this a public dataset I can access? If not, I dont have any more ideas :( – BoomBoxBoy Feb 17 '22 at 16:40
  • This is not a public dataset. But I added a link to the snapshot of the variables in my df above. Did you get a chance to see it? It might help. – CYU1 Feb 17 '22 at 16:49
  • Yes I did. It is tough to get a sense of what is wrong from that image. Are the rows with empty strings those that have 0 in them? – BoomBoxBoy Feb 17 '22 at 17:26
  • Yes, those rows turn out to be zeros. Could it be that they are object datatypes so they can't be dropped or stripped? Or maybe my df wasn't properly created. – CYU1 Feb 17 '22 at 17:40
  • Could be.. Maybe try filtering based on empty strings as well. Something like `df[df["text"]!=""]` – BoomBoxBoy Feb 17 '22 at 17:45
  • Hi, here's a link of the context of this question: https://stackoverflow.com/questions/71162678/how-to-properly-extract-information-from-bounded-regions-from-images-using-openc – CYU1 Feb 17 '22 at 17:45
  • 1
    Hi, df[df["text"]!=""] didn't work either. What a stubborn df that I have created lol! – CYU1 Feb 17 '22 at 17:49
  • Its never easy lol. Good luck my freind! – BoomBoxBoy Feb 17 '22 at 17:50
  • 1
    Thanks, I will update my post if I find a solution. – CYU1 Feb 17 '22 at 17:51
0

I found a solution for this problem:

First I converted the data type from object to float64 in my df:

    df['text'] = pd.to_numeric(df['text'])

Then I proceeded to drop the 'nan' values from the df using:

    df = df.dropna()

This works for me!

CYU1
  • 41
  • 5