0

I try to solve the following problem:
I have these two dataframes, df1:

label term
A01.12 "Culture de légumes, de melons, de racines et de tubercules"
A01.14 Culture de la canne à sucre

and df2:

company organization_activity
coca-cola "Culture de légumes, de melons, de racines et de tubercules"
rum-factory Culture de la canne à sucre

I now would like to map the "term" of df1 to the "organization_activity" of df2 and create a new column with the name "codes" in df2 with the respective label in df1.

The resulting dataframe should look like this:

organization_activity codes
"Culture de légumes, de melons, de racines et de tubercules" A01.12
Culture de la canne à sucre A01.14

This partially works by using:

df2['codes'] = df2['organization_activity'].map(df1.set_index('term')['label'].to_dict())

Oddly the new column is created and filled but for a lot of terms the mapping does not work and a Nan value is inserted even though the "organization_activity" I am mapping exists in the "term" column. Does anyone have an idea on why this could happen? The formatting for the values in "term" and "organization_activity" is equal and there doesn't seem to be a difference in the format of values that are mappable and ones that are not. Is there maybe a different way to do this?
The dataframe I showed is just an abstract so there are multiple companies that can have the same "organization_activity".

blackraven
  • 5,284
  • 7
  • 19
  • 45
julez8000
  • 19
  • 4
  • 1
    All I could think of is maybe there some extra white spaces in one of the columns and that is why it is not matching, maybe try to trim those first ? – CatDad Sep 16 '22 at 13:55

1 Answers1

1

Normally, that is what you call a merge. Assuming you have the following dataframes:

df1 = pd.DataFrame({'Label': ['A01.12', 'A01.14'],
                    'term': ['Culture de légumes, de melons, de racines et de tubercules',
                             'Culture de la canne à sucre']})


df2 = pd.DataFrame({'organization_activity': [
                    'Culture de légumes, de melons, de racines et de tubercules',
                    'Culture de la canne à sucre']})

>>> df1
    Label                                               term
0  A01.12  Culture de légumes, de melons, de racines et d...
1  A01.14                        Culture de la canne à sucre

>>> df2
                               organization_activity
0  Culture de légumes, de melons, de racines et d...
1                        Culture de la canne à sucre

This is what a merge would give you after some cleaning:

>>> df3 = df2.merge(df1, left_on="organization_activity", right_on="term")
>>> df3.drop("term", axis=1).rename(columns={"Label": "Code"})
                               organization_activity    Code
0  Culture de légumes, de melons, de racines et d...  A01.12
1                        Culture de la canne à sucre  A01.14

I also assume that there are some more columns in df2, too, which should also correctly be mapped to Labels/Codes with a merge.

fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
  • 1
    Thank you! I solved it now like this: ```pd.merge(df2[["organization_activity"]], df, left_on="organization_activity", right_on="term", how="left")``` – julez8000 Sep 16 '22 at 14:55