4

I'm trying to shuffle the values from an input dataframe, store these new values into a dictionary, and then get an output dataframe by replacing the input dataframe values with their dictionary mapping.

However, I get the "Replacement not allowed with overlapping keys and values" error.

Here's my code sample:

in_df = ['A','B','C']
in_df = pd.DataFrame(in_df,columns=['Alphabets'])
df_temp = in_df.sample(frac=1).reset_index(drop=True)
df_temp = df_temp.rename(columns={'Alphabets':'sample'})
mask_dict = dict(zip(in_df['Alphabets'], df_temp['sample']))
out_df= in_df.replace({'Alphabets': mask_dict})

in_df looks as follows:

Alphabets
A
B
C

mask_dict looks something like this:

{'A': 'C', 'B': 'A', 'C': 'C'}

I want the out_df to look like this:

Alphabets
C
A
C

I found a way to do this!

df_temp = in_df.stack().unique()
df_temp = pd.DataFrame(df_temp, columns=['Alphabets'])
df_temp1 = df_temp.sample(n=df_temp.size, random_state=123)
mask_dict = dict(zip(df_temp['Alphabets'], df_temp1['Alphabets']))
out_df = in_df.applymap(mask_dict.get)
TanviP
  • 117
  • 1
  • 2
  • 14
  • 1
    What `replace` does is apply each replacements, in order, over the entire dataframe—or, when you give it a column, the entire series. And a dict has arbitrary order. It might replace all the `A`s with `C`s, and then replace all `B`s with `A`s, and then replace all the `C`s with `C`s, giving you the output you want, or it might replace all the `B`s first, then the `C`s, then the `A`s, giving you all `C`s. – abarnert Jun 16 '18 at 20:19
  • 1
    More importantly, the fact that you're trying to replace `C` with `C` implies that you aren't actually looking for `replace`, but for something that maps values through a dict, like [`map`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html). – abarnert Jun 16 '18 at 20:22
  • @abarnert, While this seems to be the intended usage (and requirements) for `replace`, is it documented anywhere? The method [docs](http://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.Series.replace.html) are rather sparse. – jpp Jun 16 '18 at 20:24
  • 1
    AFAIK, the docs just say that it's really complicated so you should experiment with it, which… probably could be better. And they clearly expect you to recognize the parallels with `str.replace` and `re.sub` (which is mentioned in multiple places) without actually saying so. And to know about all of the other methods in Pandas, so you can figure out when it's necessary in part by a process of elimination… – abarnert Jun 16 '18 at 20:27
  • @abarnert the reason I'm using a dictionary here is because I want the output to stay, in this case ['C','A','C'], no matter how many times I run the code. With map, I'll just keep getting random output for each run. I want the dictionary to be generated randomly, but want to use the dictionary to get the same output dataframe for each run. – TanviP Jun 16 '18 at 20:35
  • I don't know why you think you'll random output for each run if you use `map`, so I don't know what to explain here… – abarnert Jun 17 '18 at 02:22

1 Answers1

2

While I cannot explain your error, you can use pd.DataFrame.applymap instead:

out_df = in_df.applymap(mask_dict.get)

This method should also be more efficient than pd.DataFrame.replace, which has a significant overhead when used with a dictionary.

If you only need to replace values in a single series, you can use pd.Series.map:

out_df = in_df.copy()
out_df['Alphabets'] = out_df['Alphabets'].map(mask-dict)

Related: Replace values in a pandas series via dictionary efficiently

jpp
  • 159,742
  • 34
  • 281
  • 339
  • I want the output to stay, in this case ['C','A','C'], no matter how many times I run the code. With map, I'll just keep getting random output for each run. I want the dictionary to be generated randomly, but want to use the dictionary to get the same output dataframe for each run. – TanviP Jun 16 '18 at 20:36
  • `I want the dictionary to be generated randomly, but want to use the dictionary to get the same output dataframe for each run.` There's a contradiction in your remark. If your dictionary changes, your output will change. – jpp Jun 16 '18 at 20:42