I have 2 CSV files in file1 I have list of research groups names. in file2 I have list of the Research full name with location as wall. I want to join these 2 csv file if the have the words matches in them.
Pandas ValueError: "Columns must be same length as key" I am using Jupyter Labs for this.
"df1[["f2_research_groups_names", "f2_location"]] = df1.apply(fn, axis=1)"
cvs row size for file1.csv 5000 data, and for file2.csv I have about 15,000
file1.csv
research_groups_names_f1 |
---|
Chinese Academy of Sciences (CAS) |
CAS |
U-M |
UQ |
University of California, Los Angeles |
Harvard University |
file2.csv
research_groups_names_f2 | Locatio_f2 |
---|---|
Chinese Academy of Sciences (CAS) | China |
University of Michigan (U-M) | USA |
The University of Queensland (UQ) | USA |
University of California | USA |
file_output.csv
research_groups_names_f1 | research_groups_names_f2 | Locatio_f2 |
---|---|---|
Chinese Academy of Sciences | Chinese Academy of Sciences(CAS) | China |
CAS | Chinese Academy of Sciences (CAS) | China |
U-M | University of Michigan (U-M) | USA |
UQ | The University of Queensland (UQ) | Australia |
Harvard University | Not found | USA |
University of California, Los Angeles | University of California | USA |
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file1.csv')
df1 = df1.add_prefix('f1_')
df2 = df2.add_prefix('f2_')
def fn(row):
for _, n in df2.iterrows():
if (
n["research_groups_names_f1"] == row["research_groups_names_f2"]
or row["research_groups_names_f1"] in n["research_groups_names_f2"]
):
return n
df1[["f2_research_groups_names", "f2_location"]] = df1.apply(fn, axis=1)
df1 = df1.rename(columns={"research_groups_names": "f1_research_groups_names"})
print(df1)