0

I have two data frames that I import from the excel sheets. There is some information I need to import from auxiliary dataframe to main dataframe if there is a matching. My code:

auxdf =pd.DataFrame({'prod':['look','duck','chik']},index=['prod_team1','prod_team2','prod_team3'])

auxdf =     

            prod
prod_team1  look
prod_team2  duck
prod_team3  chik
 
# Main dataframe after importing from an excel sheet

maindf = 
            col1                     col2
mar_team1   aoo                      auxdf['prod_team1']   
mar_team2   auxdf.loc['prod_team2']  bla
mar_team3   foo                      auxdf['prod_team3']

# I want to import information from auxdf into maindf
for i in range(len(mdf)):
  for j in range(len(mdf.columns)):
    # Check if a cell value has a string called 'auxdf', if so, change its value
    try: 
      if 'auxdf' in maindf[maindf.columns[0]].iloc[0]: 
       maindf[maindf.columns[0]].iloc[0] = eval(maindf[maindf.columns[0]].iloc[0])
    except:
      pass

Expected output:

maindf = 
            col1      col2
mar_team1   aoo       look   
mar_team2   duck      bla
mar_team3   foo       chik

Need help to find most pythonic way of doing it. Thanks

Mainland
  • 4,110
  • 3
  • 25
  • 56

3 Answers3

2

You can create Series by DataFrame.stack, get matched values by Series.str.extract, so possible mapping by Series.map with auxdf['prod'], last reshape back by Series.unstack:

s = maindf.stack()
s1 = s.str.extract(r"auxdf.*\['(.*?)'\]", expand=False)
print (s1)
mar_team1  col1           NaN
           col2    prod_team1
mar_team2  col1    prod_team2
           col2           NaN
mar_team3  col1           NaN
           col2    prod_team3
dtype: object

df = s1.map(auxdf['prod']).fillna(s).unstack()
print (df)
           col1  col2
mar_team1   aoo  look
mar_team2  duck   bla
mar_team3   foo  chik
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Got inspired by my regex approach? – mozway Dec 22 '22 at 07:31
  • @mozway - No, solution was created earlier. – jezrael Dec 22 '22 at 07:33
  • Thanks for your answer. I would not have imagined this solution. What about the one in my question (use of two `for` loops, `try`,`eval`)? is it some lame approach? – Mainland Dec 23 '22 at 00:13
  • @Mainland - In pandas is best avoid loops (if possible), so are prefered pandas methods. Also `eval` is not recommended - https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice – jezrael Dec 23 '22 at 06:16
2

You can use a regex with str.replace to match and replace the auxdf strings with the matching value:

out = maindf.apply(lambda s: s.str.replace(
        r'auxdf.*\[\'(\w+)\'\]',                 # extract key
        lambda m: auxdf['prod'].get(m.group(1)), # replace by matching value
        regex=True)

Output:

           col1  col2
mar_team1   aoo  look
mar_team2  duck   bla
mar_team3   foo  chik
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks for your answer. I would not have imagined this solution. What about the one in my question (use of two `for` loops, `try`,`eval`)? is it some lame approach? – Mainland Dec 23 '22 at 00:14
  • @Mainland for loops are slow in pandas so avoid them whenever possible. `eval` is even worse, this is a "high-power" function with a [lot of risks](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) and should never be used unless you really know what you're doing. – mozway Dec 23 '22 at 06:16
1

I hope it works for your solution,

import pandas as pd
import re
auxdf =pd.DataFrame({'prod':['look','duck','chik']},index=['prod_team1','prod_team2','prod_team3'])

maindf = pd.DataFrame(
    {
        "col1": ["aoo", "auxdf.loc['prod_team2']", "foo"],
        "col2": ["auxdf['prod_team1']", "bla", "auxdf['prod_team3']"]
    }, index=['mar_team1', 'mar_team2', 'mar_team3']
)
def replaceFunc(col):
    c = re.findall("auxdf\.?\w*\['(.*)'\]", col)
    if len(c) > 0:
        col = c[0]
        for i, v in zip(auxdf.values, auxdf.index):
            if v == col:
                col = i[0]
    return col
maindf['col1'] = maindf['col1'].apply(replaceFunc)
maindf['col2'] = maindf['col2'].apply(replaceFunc)
maindf
Muhammad Ali
  • 444
  • 7