0

I want to compare two list of lists with a dataframe column.
list1=[[r2,r4,r6],[r6,r7]]
list2=[[p4,p5,p8],[p86,p21,p0,p94]]

Dataset:

rid pid value
r2 p0 banana
r2 p4 chocolate
r4 p89 apple
r6 p5 milk
r7 p0 bread

Output:

[[chocolate,milk],[bread]]

As r2 and p4 occur in the list1[0], list2[0] and in the same row in dataset, so chocolate must be stored. Similarly r6 and p5 occur in both lists at same position and in the same row in dataset,milk must be stored.

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
rk6t7
  • 7
  • 1

2 Answers2

0

You can do it as follows:

from itertools import product

df = pd.DataFrame({'rid': {0: 'r2', 1: 'r2', 2: 'r4', 3: 'r6', 4: 'r7'},
 'pid': {0: 'p0', 1: 'p4', 2: 'p89', 3: 'p5', 4: 'p0'},
 'value': {0: 'banana', 1: 'chocolate', 2: 'apple', 3: 'milk', 4: 'bread'}})
list1 = [['r2','r4','r6'],['r6','r7']]
list2 = [['p4','p5','p8'],['p86','p21','p0','p94']]

# Generate all possible associations.
associations = (product(l1, l2) for l1, l2 in zip(list1, list2))

# Index for speed and convenience of the lookup.
df = df.set_index(['rid', 'pid']).sort_index()

output = [[df.loc[assoc, 'value'] for assoc in assoc_list if assoc in df.index] 
          for assoc_list in associations]

print(output)
[['chocolate', 'milk'], ['bread']]
user2246849
  • 4,217
  • 1
  • 12
  • 16
  • I get the following error:PerformanceWarning: indexing past lexsort depth may impact performance. – rk6t7 May 08 '22 at 08:00
  • @rk6t7 check my edit (I added `.sort_index()` after `df.set_index(['rid', 'pid'])`). ([Here's why](https://stackoverflow.com/a/54520922/2246849)) – user2246849 May 08 '22 at 08:04
0

Answer

result = []
for l1, l2 in zip(list1, list2):
    res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
    result.append(res)
[['chocolate', 'milk'], ['bread']]

Explain

  • zip will combine the two lists, equivalent to
for i in range(len(list1)):
    l1 = list1[i]
    l2 = list2[i]
  • df["rid"].isin(l1) & df["pid"].isin(l2) will combine the condition with and operator &

Attation

  • The length of list1 and list2 must be equal, otherwise, zip will ignore the rest element of the longer list.
FavorMylikes
  • 1,182
  • 11
  • 20
  • Won't zip truncate the length of the longer list? Both the lists differ in length – rk6t7 May 08 '22 at 08:08
  • @rk6t7, yes, but, as you say `because r2,p4 and r6,p5 occur in list1[0] and list2[0]`, so, there is no need to consider the rest part, I think. – FavorMylikes May 08 '22 at 08:11