I have a pandas DataFrame that was created from some raw data, there are hundreds of lines so I will just show the first 10 rows.
-
text 0 0 1 0 2 0 3 0 4 26.529 5 0 6 25.558 7 0 8 0 9 0
I want to get rid of all the zeros in my data frame and replace the column name from 'text' to 'Results', so the final data should look like this:
-
Results 0 26.529 1 25.558
My method was to use the df.drop() method to drop all rows containing zeros. My code looks like this:
df = df.drop(df[df['text'] == 0].index,inplace=True) # I didn't write the code to replace to column name yet
Somehow when I run this, the resulting df is empty/ nonetype. I have no idea why the drop method just dropped everything in my dataframe. Please help! Much appreciated in advance!
When I debug the code in debug mode (vs code), I see the values in my df are as follows:
I noticed that every element in my df is an object type. I want to get rid of all the arrays with an empty object. Ex. "000:array([''],dtype=object)" [1]: https://i.stack.imgur.com/yk63P.png
Asked
Active
Viewed 719 times
0

CYU1
- 41
- 5
-
1Btw, the index values shown on the left are not actual values in the data frame. – CYU1 Feb 03 '22 at 20:41
-
1You don't need to `df.drop()`, just use boolean masking itself: `df = df[df["text"] != 0]` – ddejohn Feb 03 '22 at 20:47
-
Hi, I added a screenshot of the variable in my df above. You can copy and paste the image link into your browser to see what I got. Thanks. – CYU1 Feb 17 '22 at 16:28
-
The reason you're getting an empty dataframe is because you're assigning an in-place operation. You can only *either* use `inplace=True` OR reassign the result via `df = ...`. You cannot do both, because when `inplace=True`, the operation modifies the original data and returns `None` (think of trying to do `my_list = my_list.append(3)`), which you are then assigning to `df`. – ddejohn Feb 17 '22 at 17:01
-
https://stackoverflow.com/questions/43893457/understanding-inplace-true-in-pandas – ddejohn Feb 17 '22 at 17:04
2 Answers
1
You can do that with the following
df[df["text"].str.strip()!="0"].rename(columns={'text':'Results'}).reset_index(drop=True)

BoomBoxBoy
- 1,770
- 1
- 5
- 23
-
Hi, this doesn't work for me. I tried and the results still contain zeros. – CYU1 Feb 17 '22 at 16:09
-
What is the output when you run `df["text"].dtype`. I suspect there could be spaces around the 0's? – BoomBoxBoy Feb 17 '22 at 16:13
-
They are objects. For example, when I ran in debugger mode with setting a breakpoint after the df was created. I can see in the df variables contains something like this "000:array([''],dtype=object". – CYU1 Feb 17 '22 at 16:25
-
1Hi, I added a screenshot of the variable in my df above. You can copy and paste the image link into your browser to see what I got. Thanks. – CYU1 Feb 17 '22 at 16:28
-
-
Hi, I just tried. Unfortunately, the results are still the same. Zeros are still in the df. – CYU1 Feb 17 '22 at 16:34
-
Im not sure what else it could be.. Is this a public dataset I can access? If not, I dont have any more ideas :( – BoomBoxBoy Feb 17 '22 at 16:40
-
This is not a public dataset. But I added a link to the snapshot of the variables in my df above. Did you get a chance to see it? It might help. – CYU1 Feb 17 '22 at 16:49
-
Yes I did. It is tough to get a sense of what is wrong from that image. Are the rows with empty strings those that have 0 in them? – BoomBoxBoy Feb 17 '22 at 17:26
-
Yes, those rows turn out to be zeros. Could it be that they are object datatypes so they can't be dropped or stripped? Or maybe my df wasn't properly created. – CYU1 Feb 17 '22 at 17:40
-
Could be.. Maybe try filtering based on empty strings as well. Something like `df[df["text"]!=""]` – BoomBoxBoy Feb 17 '22 at 17:45
-
Hi, here's a link of the context of this question: https://stackoverflow.com/questions/71162678/how-to-properly-extract-information-from-bounded-regions-from-images-using-openc – CYU1 Feb 17 '22 at 17:45
-
1Hi, df[df["text"]!=""] didn't work either. What a stubborn df that I have created lol! – CYU1 Feb 17 '22 at 17:49
-
-
1
0
I found a solution for this problem:
First I converted the data type from object to float64 in my df:
df['text'] = pd.to_numeric(df['text'])
Then I proceeded to drop the 'nan' values from the df using:
df = df.dropna()
This works for me!

CYU1
- 41
- 5