-1

this is my first post so bear with me... suppose I have this pandas dataframe (this is a sample dataframe I found here How do I use within / in operator in a Pandas DataFrame?) dataframe

ok, now suppose I want to update this dataframe so that I don't have any rows where the month column corresponds to September. Here is the code that I have been using:

 df[df.Month != '0'] 

it seems like its workings, but I get this warning:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return op(a, b)

I looked at other FutureWarnings posted on this website but most of them happen when people are using numpy. I have imported numpy, but I am not using it (or at least I think I'm not...). to top that, not only do I get this warning but it also seems like it didnt work and I still have the september rows in the dataframe afterwards.

So, to summarize, my question is how would I delete those rows depending on the Month value? and why did I get this warning? note, I also tried

 df = df[df.Month != '0'] 

because I thought maybe that was the issue, but that also doesn't work. any ideas on how to do it?

NOTE: I tried taking off the quotes as in:

 df[df.Month != 0]

and that stopped the warning but its still not working and the rows were not deleted.

AMC
  • 2,642
  • 7
  • 13
  • 35
  • You *are* using NumPy, because Pandas is built on top of it. – user2357112 Aug 25 '20 at 20:56
  • Why are you comparing `df.Month` to a string `'0'`? Does your `Month` column actually contain strings? And does the string `'0'` actually represent September? – user2357112 Aug 25 '20 at 20:57
  • 1
    Does this answer your question? [FutureWarning: elementwise comparison failed; returning scalar, but in the future will perform elementwise comparison](https://stackoverflow.com/questions/40659212/futurewarning-elementwise-comparison-failed-returning-scalar-but-in-the-futur) – Trenton McKinney Aug 25 '20 at 21:00
  • Please provide a [mcve], as well as the entire error output. – AMC Aug 25 '20 at 21:21
  • @user2357112supportsminoca im so sorry, you are right that is a typo and I meant to write 9. I'm so sorry. nonetheless It is still not working and the rows are still there. – segfaultshurtme Aug 25 '20 at 21:42

1 Answers1

0

Pandas is built upon numpy. A lot of operations with pandas dataframes are very efficient, because it uses numpy, which uses C code to use the CPU most effectively. So, even if you are not importing numpy and explicitly using it, pandas is still using it behind the screens. This is important to know. Once you understand numpy and how pandas uses it, you will understand why some ways of doing the same thing are a lot faster than others.

To select only those rows for which a particular condition holds, you might want to use the .loc[ ] syntax. For you that would be: df.loc[df.Month != '0'].

As for what things are currently supported or not: that depends on what version of Python, Numpy and Pandas you are using. I ran your code and got no warning (Python 3.7.6, Numpy 1.18.1, Pandas 1.0.1; and Python 3.6.9, Numpy 1.13.3, Pandas 1.0.1). Warnings are not errors. Your code will still run, but when you get a warning, you should check whether the function you used does indeed do what you want it to do / expect it to do. Carefully check the documentation.

  • I tried using .loc, but it doesnt work. I have python 3.7.9, if I take the quotes off then the error will disappear but I still have the rows that was supposed to be deleted. I should add that I am viewing the dataframe afterwards by having ` df.to_csv(data.csv, mode = 'a') ` and looking at the dataframe in excel – segfaultshurtme Aug 25 '20 at 21:44
  • Very strange. In Python3.7.6, I created a dataframe that has a column that are numbers in string format (as you seem to have, that's why the quotes matter). Both `df[df.Month != '0']` and `df.loc[df.Month != '0']` work without any problem. Which version of numpy do you have? When reading a dataset from a file, you should always try to make sure that the data format for each column is correct. Here you should set it to integer, using `astype('int')`. – Robby the Belgian Aug 25 '20 at 21:52
  • my numpy is 1.19.1 and pandas is 1.1.1. This is really strange! for some reason, using 'dfNEW = df [ df ['Month'] != 0 ].copy() ` works and it doesn't give any errors. Admittedly it is not the most straightforward way, but for some reason other solutions did not work. Thank you for the help! :) – segfaultshurtme Aug 25 '20 at 22:03