0

I have two Dataframe's, let's call them df1 and df2.

df1

Term Served term1 82321 term2 54232 term3 34323 term4 1231

df2

Full Term clicks this is term1 233 oh boy this is term2 122 yea that's right term1 1121 oh no not that term4 313123

I would like to go row by row and find every time that the terms in df1 appear in df2. After that I would like to sum all of the clicks for that specific term. The out put would look like,

Term Served Clicks term1 82321 1354 term2 54232 122 term3 34323 0 term4 1231 313123

Here is what I have so far. I haven't gotten past grabing all of the times that the terms in df1 appear in df2. The code below keeps looping through only the first row in df1. Maybe I am not understanding the str.findall() or I have my loops wrong.

for index, row in df1.iterrows(): for row2 in df2.iteritems(): full_headline = df2['Full Term'].str.findall(row[0]) print(full_headline)

1 Answers1

2

IIUC using str.findall extact the Term in df2 from df1 , then we need gourpby sum the common Term in df2 .So far right now ,we only need assign the result back to df1 using map

df2['Full Term']=df2['Full Term'].str.findall('|'.join(df1.Term)).str[0]
s=df2.groupby('Full Term').clicks.sum()
df1['Clicks']=df1.Term.map(s).fillna(0)
df1
Out[114]: 
    Term  Served    Clicks
0  term1   82321    1354.0
1  term2   54232     122.0
2  term3   34323       0.0
3  term4    1231  313123.0

Update if that is the case you may want to see unnesting after str.findall

df2['Full Term']=df2['Full Term'].str.findall('|'.join(df1.Term))
df2=df2[df2['Full Term'].astype(bool)].copy()#adding here

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
s=unnesting(df2,['Full Term']).groupby('Full Term').clicks.sum()
df1['Clicks'] = df1.Term.map(s).fillna(0)
df1
Out[137]: 
    Term  Served  Clicks
0  term1   82321    1354
1  term2   54232     355
2  term3   34323     233
3  term4    1231  313123

BENY
  • 317,841
  • 20
  • 164
  • 234
  • I tried your code out and ended up with an array of ints containing all click rows for each term. For example, term1 will have 2331121. it seems like it isn't summing but adding the clicks in an array? – Timothy Mcwilliams Mar 26 '19 at 20:56
  • `TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'` here there, thanks for the quick response. when unseating that error us being thrown, I can do some googling on it – Timothy Mcwilliams Mar 26 '19 at 21:18
  • @TimothyMcwilliams adding one line before unnesting – BENY Mar 26 '19 at 21:20
  • added the line right before unnesting and still came up with the same result, no error though. – Timothy Mcwilliams Mar 26 '19 at 21:48