0

I have some code that needs to replace two columns of a pandas DataFrame with the index of each value as they appear in a unique list of those values. For example:

col1, col2, col3, col4
A, 1, 2, 3
A, 1, 2, 3
B, 1, 2, 3

Should end up in the data frame as:

col1, col2, col3, col4
0, 1, 2, 3
0, 1, 2, 3
1, 1, 2, 3

since A is element 0 in the list of unique col1 values, and B is element number 1.

What I did is:

unique_vals = df['col1'].unique()

# create a map to speed up looking indexes when we map the dataframe column
unique_vals.sort()
unique_vals_map = {}
for i in range(len(unique_vals)):
    unique_vals_map[unique_vals[i]] = i

df['col1'] = df['col1'].apply(lambda r: unique_vals_map[r])

However that last line gives me:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I saw other SO answers about this, but I am not sure how to fix it in my particular case. I'm experienced with numpy but I'm new to pandas, any help is greatly appreciated!

Is there a better way to perform this mapping?

Edy Bourne
  • 5,679
  • 13
  • 53
  • 101
  • I don't think that solves anything but why `df['col']` and not `df['col1']`? That's not consistent with the required output. – Joooeey May 29 '23 at 17:05
  • And importantly, what does `df` look like in the end? Is there an actual problem or just a spurious warning? – Joooeey May 29 '23 at 17:10
  • Another tangential suggestion: `df['col1'].map(unique_vals_map)` works too and is shorter than `apply`. – Joooeey May 29 '23 at 17:14
  • I can't reproduce the warning in Pandas 1.4.3. What version are you using? – Joooeey May 29 '23 at 17:19
  • @Joooeey ops, that was a typo - I was creating a minimal version of my actual code. The pandas version I am in is 2.0.1. The result is correct, but this is very annoying warning as it shows in many sections of my actual code and pollutes the output. I'd like to address whatever is causing it, but it's not clear to me how to fix it. – Edy Bourne May 29 '23 at 17:29
  • Pandas' issue tracker looks like it's full of false positives and many have been there for years. So perhaps it's fine to use the warnings library to suppress the warning. I absolutely can't see anything that would create a copy there. – Joooeey May 29 '23 at 17:48
  • Awesome, thank you! I will look into the warnings library. Should I close this question? Your info was useful to me, if you want to post an answer I can accept too. – Edy Bourne May 29 '23 at 17:53
  • 1
    It's best practice to post an answer once you found it. Unless we have a duplicate somewhere. This is a common issue I've experienced many times. – Joooeey May 29 '23 at 17:54
  • Also, there's a newer version of Pandas. Maybe that fixes the warning. – Joooeey May 29 '23 at 17:55
  • Interestingly, I can't reproduce with Pandas 2.0.1 either. That's on Python 3.10.11. Are you sure you can reproduce the error with the code above in a new interpreter? – Joooeey May 29 '23 at 18:01
  • Here's the generic duplicate target: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas It would be interesting what went wrong in this particular case, though! – Joooeey May 29 '23 at 18:30
  • @Joooeey I agree, it is a bit of a mystery.. for now I just silenced the warnings and everything seems to be fine.... – Edy Bourne Jun 06 '23 at 13:55
  • Can you post the code you used for creating `df` in the first place? – Joooeey Jun 06 '23 at 14:01

0 Answers0