1

Here is example:

df = pd.DataFrame({
    'link': ['link1', 'link1', 'link2', 'link2', 'link3', 'link3'],
    'text': ['text1', 'text2', 'text3', 'text4', 'text5', 'text6']
})

I have a function (Levenshtein distance) which i would apply to each unique link and get result like this:

    link    text
0   link1   text1 text2 function(text1, text2) result
1   link2   text3 text4 function(text1, text2) result
2   link3   text5 text6 function(text1, text2) result
Contra111
  • 325
  • 2
  • 10

2 Answers2

2

You can use the function pivot_table:

df = df.pivot_table(index='link', values='text', aggfunc=[list, 'sum']).reset_index()
df.columns = ['link', 'text', 'result']

Output:

    link            text      result
0  link1  [text1, text2]  text1text2
1  link2  [text3, text4]  text3text4
2  link3  [text5, text6]  text5text6

You need to replace 'sum' in my solution with your function.

Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
1

I think you need if always 2 values per groups:

def distance(a, b):
     #your function
     return di

df = df.groupby('link')['text'].agg([('text',  ' '.join),
                                     ('out', lambda x: distance(x.iat[0], x.iat[1]))])

Or use * for splatting:

df = df.groupby('link')['text'].agg([('text',  ' '.join),
                                     ('out', lambda x: distance(*x.tolist()))])

print (df)
              text  lev
link                   
link1  text1 text2    1
link2  text3 text4    1
link3  text5 text6    1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, i have already my function, but question is how function get arguments from two rows? – Contra111 Nov 20 '19 at 15:11
  • @Contra111 - for `Levenshtein distance` need 2 values - in sample are 2 values for groups, so if use `*x.tolist()` it [splatting](https://stackoverflow.com/questions/2322355/proper-name-for-python-operator) list to 2 scalars. So what is input in your function? 2 values? list? – jezrael Nov 20 '19 at 15:15
  • Yea, my function distance(a, b) takes 2 strings and return int. I need to call function with distance(text1, text2), distance(text3, text4), distance(text5, text6) – Contra111 Nov 20 '19 at 15:17
  • I need to get result in one row like: link, text1 text2 (in one row) and result (in same column or separate, no matter) – Contra111 Nov 20 '19 at 15:18
  • @Contra111 - Can you check `df = df.groupby('link')['text'].apply(your_func)` ? – jezrael Nov 20 '19 at 15:18
  • @Contra111 - it working in 2 values for groups like in sample data – jezrael Nov 20 '19 at 15:18
  • sure, but what i shiuld place instead of 'your_func'? 'distance'? Where i need to place arguments? – Contra111 Nov 20 '19 at 15:19
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202737/discussion-between-jezrael-and-contra111). – jezrael Nov 20 '19 at 15:20