Intersection of list and dataframe, keeping duplicates of list but showing the values of a column in a dataframe

Question

Found this link and my work is somewhat similar.

Say I have:

x = ['the', 'the', 'and', 'a', 'apple', 'heart', 'heart']
y = {'words': ['the', 'belt', 'computer', 'heart','and'],'values':[3,2,1,1,4]}

Using the suggestion in the link above, I got this:

df = pd.DataFrame.from_dict(y)
items = set(df['words'])

found = [i for i in x if i in items] 
print(found)

The result is: ['the', 'the', 'and', 'heart', 'heart']

I want to be able to get the corresponding value of the word and I am stuck. The result I want is this:

[3,3,4,1,1]

Any thoughts on how to achieve this? Would greatly appreciate it.

score 2 · Accepted Answer · answered Oct 16 '21 at 04:53

2

You don't need pandas. First rework your dictionary to have the words as keys, then use a comprehension:

y2 = dict(zip(*y.values()))
[y2[i] for i in x if i in y2]

Output: [3,3,4,1,1]

The (much less efficient) equivalent in pandas is:

s = df.set_index('words')['values']
pd.Series(x).map(s).dropna()

Output:

0    3.0
1    3.0
2    4.0
5    1.0
6    1.0
dtype: float64

answered Oct 16 '21 at 04:53

mozway

thanks for this. my data is actually very large (in thousands). Is it more efficient to use dict rather than pandas? – axia_so2 Oct 16 '21 at 05:03
1

Thousands is not huge, you can test both and compare. If using jupyter you can write `%%timeit` at the beginning of the cell to check the run length, if a script, there is the `timeit` module – mozway Oct 16 '21 at 05:11

1 Answers1