-2

I am trying to find a quicker way than using a for loop in a for loop to replace the variables in column a in one table with the variables in column b in another table.

for x in range(len(a["a"])):
    for y in range(len(b["a"])):
        if a["a"][x] == b["a"][y]:
            a["a"] = out['a'].replace([a["a"][x]],b["b"][y]])

This currently works but is super slow, is there anyway to do the same thing but make it faster?

Sample Data:

a = pd.DataFrame({'a': ['a','b','c','d','e','f','g', 'h', 'i']}) 

b = pd.DataFrame({'a': ['a','b','c','d','e','f','g'], 'b': ['alpha', 'alpha', 'alpha', 'beta', 'beta', 'charlie' 'charlie']})

Basically I am trying to replace the value in a["a"] with the values in b["b"] if a["a"] == b["a"]

  • 6
    [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/4046632). Also [ask]. – buran Nov 21 '22 at 22:12
  • I apologize the data I work with is confidential, and I struggle with making a table on here its always iffy when I try to make it and ruins my entire question – Ahmed Tawakol Nov 21 '22 at 22:35
  • 1
    @AhmedTawakol The easiest way to do that is to do `df.to_dict()`, which does not produce a table, but does produce something you can put inside a call to `pd.DataFrame()`. That means that people can just copy/paste the code into their own editor. – Nick ODell Nov 21 '22 at 22:40
  • added sample data – Ahmed Tawakol Nov 21 '22 at 22:50

1 Answers1

1

You cannot use the pandas where function because your two dataframes have different numbers of elements. But the code below will work (I renamed your dataframes df1 and df2 for clarity)

df1['a'].loc[df1['a'].isin(df2['a'])] = df2['b']

which for your sample data results in

         a
0    alpha
1    alpha
2    alpha
3     beta
4     beta
5  charlie
6  charlie
7        h
8        i
user19077881
  • 3,643
  • 2
  • 3
  • 14