First, I have an excel file (or csv file) which I have converted to a dataframe (df
).
Next, there is one master list of strings in the first column, which contains alphanumeric characters.
Then, there are lists of strings in subsequent columns, which can be the same length (list1
), shorter (list2
), or longer (list3
).
I would like to count
(or sum
) the number of exact matches between master_list
and the other lists.
# Assign spreadsheet filename to `file`
file = "list_match.xlsx"
# Load spreadsheet
import pandas as pd
df = pd.read_excel(file)
print (df)
master_list list1 list2 list3
abc abc abc stu
def xxx def zzz
ghi xxx yyy zzz
jkl xxx yyy zzz
mno1 xxx yz1 zzz
pqr xxx zzz
stu xxx zzz
vwx xxx zzz
yz1 xxx zzz
yz2 xx1 zzz
yz3 xx1 zzz
zzz
mno1
zzz
The objective is to create this result:
List Count
list1 1
list2 3
list3 2
Explanation of result:
- There is one match in list1: 'abc'
- There are three matches in list2: 'abc', 'def', 'yz1'
- There are two matches in list3: 'stu', 'mno1'
My question is similar to this question, except the data is transposed and there are multiple lists to compare which might require a loop.