3

I have a dataframe like below:

>>> df1
           a   b
0  [1, 2, 3]  10
1  [4, 5, 6]  20
2     [7, 8]  30

and another like:

>>> df2
   a
0  1
1  2
2  3
3  4
4  5

I need to create column 'c' in df2 from column 'b' of df1 if column 'a' value of df2 is in coulmn 'a' df1. In df1 each tuple of column 'a' is a list.

I have tried to implement from following url, but got nothing so far: https://medium.com/@Imaadmkhan1/using-pandas-to-create-a-conditional-column-by-selecting-multiple-columns-in-two-different-b50886fabb7d

expect result is

>>> df2
   a  c
0  1 10
1  2 10
2  3 10
3  4 20
4  5 20

3 Answers3

5

Use Series.map by flattening values from df1 to dictionary:

d = {c: b for a, b in zip(df1['a'], df1['b']) for c in a}
print (d)
{1: 10, 2: 10, 3: 10, 4: 20, 5: 20, 6: 20, 7: 30, 8: 30}

df2['new'] = df2['a'].map(d)
print (df2)
   a  new
0  1   10
1  2   10
2  3   10
3  4   20
4  5   20

EDIT: I think problem is mixed integers in list in column a, solution is use if/else for test it for new dictionary:

d = {}
for a, b in zip(df1['a'], df1['b']):
    if isinstance(a, list):
        for c in a:
            d[c] = b
    else:
        d[a] = b

df2['new'] = df2['a'].map(d)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Actual data is a list of ip. I am getting the below error, there: Traceback (most recent call last): File "", line 1, in File "", line 1, in TypeError: 'int' object is not iterable – Binayak Chatterjee Apr 04 '19 at 10:27
  • @BinayakChatterjee - Data are confidental? – jezrael Apr 04 '19 at 10:45
  • @BinayakChatterjee - Is possible edit question? Because bad formating of comments data. Thank you. – jezrael Apr 04 '19 at 10:49
  • @BinayakChatterjee - so change `d = {c: b for a, b in zip(df1['a'], df1['b']) for c in a}` to `d = {c: b for a, b in zip(df1['b'], df1['a']) for c in a}` - swap a, b – jezrael Apr 04 '19 at 10:59
4

Use :

m=pd.DataFrame({'a':np.concatenate(df.a.values),'b':df.b.repeat(df.a.str.len())})
df2.merge(m,on='a')

   a   b
0  1  10
1  2  10
2  3  10
3  4  20
4  5  20
anky
  • 74,114
  • 11
  • 41
  • 70
2

First we unnest the list df1 to rows, then we merge them on column a:

df1 = df1.set_index('b').a.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'a'})
print(df1, '\n')

df_final = df2.merge(df1, on='a')
print(df_final)

    b    a
0  10  1.0
1  10  2.0
2  10  3.0
0  20  4.0
1  20  5.0
2  20  6.0
0  30  7.0
1  30  8.0 

   a   b
0  1  10
1  2  10
2  3  10
3  4  20
4  5  20
Erfan
  • 40,971
  • 8
  • 66
  • 78
  • 1
    Thanks for heads up @jezrael. I use this method from Wen-Bens post here: https://stackoverflow.com/questions/53218931/how-do-i-unnest-explode-a-column-in-a-pandas-dataframe/53218939#53218939 – Erfan Apr 04 '19 at 09:41
  • Ya, unfortunately is still used a lot, but the best never used it - to slow. – jezrael Apr 04 '19 at 09:42
  • Maybe you should post an answer, eplaining this dict flattening you use? @jezrael – Erfan Apr 04 '19 at 09:43