2

All, I have an analytical csv file with 190 columns and 902 rows. I need to recode values in several columns (18 to be exact) from it's current 1-5 Likert scaling to 0-4 Likert scaling.

I've tried using replace:

df.replace({'Job_Performance1': {1:0, 2:1, 3:2, 4:3, 5:4}}, inplace=True)

But that throws a Value Error: "Replacement not allowed with overlapping keys and values"

I can use map:

df['job_perf1'] = df.Job_Performance1.map({1:0, 2:1, 3:2, 4:3, 5:4})

But, I know there has to be a more efficient way to accomplish this since this use case is standard in statistical analysis and statistical software e.g. SPSS

I've reviewed multiple questions on StackOverFlow but none of them quite fit my use case. e.g. Pandas - replacing column values, pandas replace multiple values one column, Python pandas: replace values multiple columns matching multiple columns from another dataframe

Suggestions?

Community
  • 1
  • 1
R_Queery
  • 497
  • 1
  • 9
  • 19
  • What's wrong with just subtracting `1` from the column? – EdChum Dec 22 '15 at 23:52
  • I feel this question is not properly answered. What if you cannot just do `-1`, for example because you want a replacement such as `{1:0, 2:1, 4:3}`? – ssice Oct 05 '16 at 15:01

2 Answers2

3

You can simply subtract a scalar value from your column which is in effect what you're doing here:

df['job_perf1'] = df['job_perf1'] - 1

Also as you need to do this on 18 cols, then I'd construct a list of the 18 column names and just subtract 1 from all of them at once:

df[col_list] = df[col_list] - 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
1

No need for a mapping. This can be done as a vector addition, since effectively, what you're doing, is subtracting 1 from each value. This works elegantly:

df['job_perf1'] = df['Job_Performance1'] - numpy.ones(len(df['Job_Performance1']))

Or, without numpy:

df['job_perf1'] = df['Job_Performance1'] - [1] * len(df['Job_Performance1'])
Nelewout
  • 6,281
  • 3
  • 29
  • 39
  • you don't need to do this just do `df['job_perf1'] = df['job_perf1'] -1` – EdChum Dec 22 '15 at 23:57
  • @EdChum aaaah, that makes sense, thanks for pointing that out. :) Now I just need to figure out how to loop through my 18 columns. thanks! – R_Queery Dec 23 '15 at 00:04
  • Is your entire df 18 cols? if so then `df = df -1` will just do it all otherwise, compose a list of the cols of interest and then do `for col in col_list: df[col] = df[col] - 1` – EdChum Dec 23 '15 at 00:05
  • @EdChum I really need to look through the docs more often. Anyways, for what it's worth, pandas converts a scalar to an appropriately sized array under the hood, cf. [pandas source](https://github.com/pydata/pandas/blob/b60b55934dcfd789ef38dbecee62a6ecf5cddb21/pandas/lib.pyx#L1840-1873). – Nelewout Dec 23 '15 at 00:10
  • @EdChum entire dataframe is 190+ columns so can't do the operation on the entire data frame. Also, that code is much less complex than I imagined, is that because we are doing a column operation instead of a row or cell based operation? (obvi still pretty new to Python/Pandas) – R_Queery Dec 23 '15 at 00:13
  • that's correct we're performing column/Series ops here, actually I think this would just work: `df[cols_list] = df[cols_list] - 1` – EdChum Dec 23 '15 at 00:14
  • @EdChum awesome, thanks for the insight and thanks for the help!! – R_Queery Dec 23 '15 at 00:18