Using Pandas, how do i compare 1 by 1 every cell under a column in dataframe1 to every cell under a column in datafram2

Question

I have one excel file and one csv file that I will be using. Both file paths will be passed into a function (excel file path = “list” and csv file path = “page”)

I want to create 2 dataframes, one for each file. df1 for the excel file and df2 for the csv file.

This is the information on the csv file (df2) I will be using (will only use “Name” and “School” columns |One|Name|Two|Three|School|Four|Five| |:-:|:--:|:-:|:---:|:-----|:--:|:--:| |q|Sunny|a|z|Eliot|p|l| |w|Pete|s|x|Maury|o|k| |e|Bobby|g|c|Woodson|i|j| |r|James|f|v|Eastern|u|h| |t|Walter|h|b|Ballou|y|h| |u|Marcus|j|n|Wilson|t|g| |y|Daniel|k|m|Dunbar|r|f|

• 1st - I want to read df1 and copy the “Name”, “City”, and “State” columns from df1 and paste them to another workbook “Results”. I also want to add a 4th column "School"

Results workbook should look like this ("School" column should not be populated yet)

Name	City	State
Bobby	Washington	DC
James	Philadelphia	PA
Walter	Greensboro	NC
Pete	College Park	MD
Sunny	Arlington	VA

• 2nd – starting with the 1st row of df1, I want to compare every cell 1 by 1 under the "Name" column of df1 (sub1 = temp storage for the information of the current cell of df1), to every cell under the "Name" column of df2 starting from the 1st row (sub2 = temp storage for the information of the current cell of df2) until the sub1 == sub2.

If sub1 == sub2, then I want to copy the cell under the “School” column of df2 for that row and paste it to the “School” column of the workbook “Results” starting with the 1st row.
If sub1 != sub2, I want to increase the row of df2 and compare sub1 to the new cell for sub2 until sub1 == sub2
I want to repeat this for all on the names under the “Name” column of df1

Here is my code

import os import pandas as pd import openpyxl as opxl

def Page(Test, Report): # select columns i want to work with copy_columns = ["Name", "City", "State"] compare_column_t = ["Name"] compare_column_r = ["Name", "School"] write_columns = [“School"]

 # create dataframes for the columns to copy & paste to the output file and columns to compare
 data_frame = pd.DataFrame(columns=copy_columns)
 df_com_tl = pd.DataFrame(columns=compare_column_t)
 df_com_cng = pd.DataFrame(columns=compare_column_r)
 data_frame2 = pd.DataFrame(columns=write_columns)

 # copy "Name", "City", "State" columns from “Test” input file and paste them in the output file   
 #“Result”…. The code up until here will work for part 1
df_file = pd.read_excel(Test)
selected_columns = df_file.loc[:, copy_columns]
data_frame = pd.concat([selected_columns, data_frame], ignore_index=True)
data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1) 

# part 2 is where I’m stuck… compare "Name" column from “Test” and “Report” file and print the 
# “School” to the output file “Result”
df_file2 = pd.read_csv(CalbertNG_Report, encoding='cp1252')
     for i in range (0, len(df_file)):
           for j in  range (0, len(df_file2)):
                 sub1 = df_file.loc[i, compare_column_t]
                 sub2 = df_file2.loc[j, compare_column_r]
                 #print(sub1,sub2)
                 if sub1==sub2:
                    data_frame2.loc[i, write_columns] = subfunc2
                    print(data_frame2.loc[i, write_columns])
  data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1)

Any help is greatly appreciated.

I believe you want to use : df3 = df1.merge(df2[['Name','School']], on='Name', how = 'outer') — frankr6591, Dec 07 '20 at 18:39

Using Pandas, how do i compare 1 by 1 every cell under a column in dataframe1 to every cell under a column in datafram2

0 Answers0