0

Is it possible to pd.merge() a df and csv when the df column is a list (could be more than one variable) and the csv only one.

df
   GV2015_VAL                               polName
0    605000.0                           [LENTEGEUR]
1         NaN                         [DURBANVILLE]
2         NaN                         [DURBANVILLE]
3    730000.0  [BISHOP LAVIS, GUGULETHU, MANENBERG]
4    625000.0                           [LENTEGEUR]
csv
       name                   m                    p         j
0        LENTEGEUR                17.0                501.0     518.0
1      DURBANVILLE                10.0                495.0     505.0
2           BELHAR                 9.0                352.0     361.0
3        MANENBERG                29.0               1013.0    1042.0
4        GUGULETHU                 1.0                192.0     193.0
5     BISHOP LAVIS                10.0                495.0     505.0

name will match with polName.

Furthermore; the j parameter of csv should .aggregate (mean) when more than one variable present. So that the output for df should be:

df
   GV2015_VAL                               polName   merge_j
0    605000.0                           [LENTEGEUR]     518.0
1         NaN                         [DURBANVILLE]     505.0
2         NaN                         [DURBANVILLE]     505.0
3    730000.0  [BISHOP LAVIS, GUGULETHU, MANENBERG]     580.0    
4    625000.0                           [LENTEGEUR]     518.0

How can the built-in .merge() handle the challenge or will looping / list comprehension be necessary?

arkriger
  • 207
  • 2
  • 7
  • 1
    Please provide a [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) with inputs as code `df.head().to_dict()` and the expected output. – not_speshal Sep 02 '23 at 17:20
  • Try something like: `gdf.reset_index().explode('polName').merge(csv.rename(columns={'name': 'polName'}), on='polName').set_index('index').rename_axis(None)` – Corralien Sep 02 '23 at 22:14

1 Answers1

0

You could probably add an additional column converting the list column to a normal one:

gdf['Name'] = gdf['polName'].apply(lambda lst: lst[0])
gdf.merge(csv, on='Name')