1

I have two dataframe like:

import pandas as pd

df1 = pd.DataFrame({'Airplanes' : ['U-2','B-52,P-51', 'F-16', 'MiG-21,F-16;A-10', 'P-51','A-10;P-51' ],
                    'Company' : ['Air_1', 'Air_3', 'Air_2','Air_1', 'Air_7', 'Air_3']})
------------------------------
          Airplanes Company
0               U-2   Air_1
1         B-52,P-51   Air_3
2              F-16   Air_2
3  MiG-21,F-16;A-10   Air_1
4              P-51   Air_7
5         A-10;P-51   Air_3
-------------------------------

df2 = pd.DataFrame({'Model' : ['U-2','B-52', 'F-16', 'MiG-21', 'P-51','A-10' ],
                    'Description' : ['Strong', 'Huge', 'Quick','Light', 'Silent', 'Comfortable']})
------------------------------
     Model  Description
0     U-2       Strong
1    B-52         Huge
2    F-16        Quick
3  MiG-21        Light
4    P-51       Silent
5    A-10  Comfortable
------------------------------

I would like to insert the information of df2 inside df1. In particular, the Description column must appear in df1, respecting the separators of the df1 column ['Airplanes']. So in this case the output should be:

---------------------------------------------------------
          Airplanes Company                 Description
0               U-2   Air_1                       Srong
1         B-52,P-51   Air_3                 Huge,Silent
2              F-16   Air_2                       Quick
3  MiG-21,F-16;A-10   Air_1     Light,Quick;Comfortable
4              P-51   Air_7                      Silent
5         A-10;P-51   Air_3          Comfortable;Silent
--------------------------------------------------------

How can I do?

Jake85
  • 27
  • 5
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Naveed Oct 20 '22 at 20:29
  • May need an extra `explode` or something to get each elements from the Airplanes column. Do you care if the comma and the semicolon becomes only comma at the end? – Ben.T Oct 20 '22 at 20:32
  • Naveed very useful but is not exactly what i need. Ben.T Yes i need to preserv "," and ";" – Jake85 Oct 20 '22 at 20:49
  • @Jake85 I'd be very interested to know why you need to do that. Is this for an assignment or a real use case? – mozway Oct 20 '22 at 20:51
  • it is for real use however I have simplified the problem a lot to make it understandable for the question. In reality the information is more complex and I need to keep the two signs to understand if the information comes from one plane or another and inside that plane I need to know the various attributes (divided by commas) – Jake85 Oct 20 '22 at 22:05
  • Do you have something to recommend? – Jake85 Oct 25 '22 at 16:07

3 Answers3

1

You could do split with explode then map

df1['new'] = df1.Airplanes.str.split('[,|;]').explode().map(df2.set_index('Model')['Description']).groupby(level=0).agg(','.join)
df1
Out[62]: 
          Airplanes Company                      new
0               U-2   Air_1                   Strong
1         B-52,P-51   Air_3              Huge,Silent
2              F-16   Air_2                    Quick
3  MiG-21,F-16;A-10   Air_1  Light,Quick,Comfortable
4              P-51   Air_7                   Silent
5         A-10;P-51   Air_3       Comfortable,Silent
BENY
  • 317,841
  • 20
  • 164
  • 234
1

You can use a regex and str.replace:

mapper = df2.set_index('Model')['Description'].to_dict()
regex = '|'.join(df2['Model'])
# 'U-2|B-52|F-16|MiG-21|P-51|A-10'

df1['Description'] = df1['Airplanes'].str.replace(regex, lambda m: mapper.get(m.group()))

output:

          Airplanes Company              Description
0               U-2   Air_1                   Strong
1         B-52,P-51   Air_3              Huge,Silent
2              F-16   Air_2                    Quick
3  MiG-21,F-16;A-10   Air_1  Light,Quick;Comfortable
4              P-51   Air_7                   Silent
5         A-10;P-51   Air_3       Comfortable;Silent
mozway
  • 194,879
  • 13
  • 39
  • 75
  • It would be possible in the "Description" column to insert a "|" in place of a ","? Or, in general, change the 2 initial separator characters in the "Description" column – Jake85 Oct 24 '22 at 16:11
  • I was thinking of making a copy of the column and then a substitute for separators but there is probably a more effective method for this – Jake85 Oct 24 '22 at 20:36
  • Your can probably use `str.replace` – mozway Oct 25 '22 at 02:20
  • You're right, I forgot to tell you that I would like to do this because in the description column there is an equal character "," for example (Strong, STG) so this creates confusion between the comma used to divide and the one used for the description. So a str.replace would not be enough – Jake85 Oct 25 '22 at 08:25
  • 1
    You could try a double replace: `df1['Description'] = df1['Airplanes'].str.replace(',', '|', regex=False).str.replace(regex, lambda m: mapper.get(m.group()))` – mozway Oct 25 '22 at 16:28
0
d=dict(df2.values)

df1['description']=df1['Airplanes'].str.split('[,|;]').apply(lambda x: ','.join([d[i] for i in x]))
df1

    Airplanes   Company     description
0   U-2               Air_1     Strong
1   B-52,P-51         Air_3     Huge,Silent
2   F-16              Air_2     Quick
3   MiG-21,F-16,A-10  Air_1     Light,Quick,Comfortable
4   P-51              Air_7     Silent
5   A-10,P-51         Air_3     Comfortable,Silent```
Naveed
  • 11,495
  • 2
  • 14
  • 21