0

I'm working on large csv file with almost only strings. I want to do some statisticals test such as define clusters but for that I need to convert my string as int. (I 'm totally new on python, pandas, scikitlearn as well).

so here my code:

#replace str as int
df.WORK_TYPE[df.WORK_TYPE == 'aaa']=1
df.WORK_TYPE[df.WORK_TYPE == 'bbb']=2
df.WORK_TYPE[df.WORK_TYPE == 'ccc']=3
df.WORK_TYPE[df.WORK_TYPE == 'ddd']=4
print(df)

And here my error message:

C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame 

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
C:\Users\ishemf64\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.

I don't understand why I have this error and also could you tell me if there is another way and/or mandatory to convert text if I want to do my analysis.

DarkSuniuM
  • 2,523
  • 2
  • 26
  • 43
FK IE
  • 5
  • 5
  • As described in the duplicate: `df.loc[df['WORK_TYPE'] == 'aaa', 'WORK_TYPE']=1` – jpp Nov 10 '18 at 00:41

1 Answers1

0

That looks like a warning, not an error. Better folks than I have explained it here: https://www.dataquest.io/blog/settingwithcopywarning/

Since you seem to have only a few categories, would you consider using get_dummies? It takes your pd.Series with categorical data in it and helps you convert it into dummy variables (1 if present, 0 if not). Check it out here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html

Charles Landau
  • 4,187
  • 1
  • 8
  • 24