0

I am new to python and doing practice with pandas. In my data frame there's a column called NET_REVENUE and its type is string. I'm trying to convert it into float for further analysis.

However, when I run following code it gives feedback I don't really understand. I am positive that there were no missing value in the original column. Obviously some have been converted successfully into float. But 2918 out of 4732 are not.

Can someone help please?

sep_IM_2019['NET_REVENUE_numeric'] = pd.to_numeric(sep_IM_2019['NET_REVENUE'], errors='coerce')

/Users/Leo/opt/miniconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.

sep_IM_2019.NET_REVENUE_numeric.isnull().sum()
#2918


    sep_IM_2019.NET_REVENUE_numeric
8       NaN
46      NaN
56      NaN
62      NaN
71      NaN
         ..
76472   NaN
76476   NaN
76503   NaN
76505   NaN
76510   NaN
Name: NET_REVENUE_numeric, Length: 4732, dtype: float64
Scott Boston
  • 147,308
  • 15
  • 139
  • 187

2 Answers2

0

This answer might help you understand the warning - https://stackoverflow.com/a/20627316/8231447

As for converting to a float - I'm not sure whether you want to run to_numeric, or call .astype(float) on the series

sep_IM_2019["NET_REVENUE_numeric"] = ]sep_IM_2019['NET_REVENUE'].astype(float).

Hope this helps!

Adoni5
  • 61
  • 3
  • Thank you @Adoni5, I actually wonder where's the difference b/w to_numeric and astype in terms of result we could get. – supermanleeg Jun 16 '20 at 04:02
0

You used errors='coerce', so when it doesn't manage to convert to float it gives you a NaN instead of raising an error.

Try to run to_numeric without that parameter to see what's the problem exactly. It's likely that some of your strings can't be converted. Take a closer look at them:

sep_IM_2019.loc[8,'NET_REVENUE']

should be the first string having that problem if I believe your list.

  • It turns out values that couldn't be converted are 'currency', with auto-generated 1000 comma, in the csv file. After changing them all into 'general' in csv, I am able to use either to_numeric or astype to convert all the values into floats with no NaN. However, I still get the same waring as above: /Users/Leo/opt/miniconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead... – supermanleeg Jun 16 '20 at 03:58
  • The warning has nothing to do with the conversion. It's because at some point you've made a copy of your dataframe without saying you wanted it to be a whole new dataframe, so to speak. Without your code I can't tell you where. – el123456789 Jun 16 '20 at 07:45