I have one excel file and one csv file that I will be using. Both file paths will be passed into a function (excel file path = “list” and csv file path = “page”)
I want to create 2 dataframes, one for each file. df1 for the excel file and df2 for the csv file.
This is the information on the excel file (df1) that I will be using |Name|City|State| |:----:|:----:|:----:| |Bobby|Washington|DC| |James|Philadelphia|PA| |Walter|Greensboro|NC| |Pete|College Park|MD| |Sunny|Arlington|VA|
This is the information on the csv file (df2) I will be using (will only use “Name” and “School” columns |One|Name|Two|Three|School|Four|Five| |:-:|:--:|:-:|:---:|:-----|:--:|:--:| |q|Sunny|a|z|Eliot|p|l| |w|Pete|s|x|Maury|o|k| |e|Bobby|g|c|Woodson|i|j| |r|James|f|v|Eastern|u|h| |t|Walter|h|b|Ballou|y|h| |u|Marcus|j|n|Wilson|t|g| |y|Daniel|k|m|Dunbar|r|f|
• 1st - I want to read df1 and copy the “Name”, “City”, and “State” columns from df1 and paste them to another workbook “Results”. I also want to add a 4th column "School"
- Results workbook should look like this ("School" column should not be populated yet)
Name | City | State | School |
---|---|---|---|
Bobby | Washington | DC | |
James | Philadelphia | PA | |
Walter | Greensboro | NC | |
Pete | College Park | MD | |
Sunny | Arlington | VA |
• 2nd – starting with the 1st row of df1, I want to compare every cell 1 by 1 under the "Name" column of df1 (sub1 = temp storage for the information of the current cell of df1), to every cell under the "Name" column of df2 starting from the 1st row (sub2 = temp storage for the information of the current cell of df2) until the sub1 == sub2.
- If sub1 == sub2, then I want to copy the cell under the “School” column of df2 for that row and paste it to the “School” column of the workbook “Results” starting with the 1st row.
- If sub1 != sub2, I want to increase the row of df2 and compare sub1 to the new cell for sub2 until sub1 == sub2
- I want to repeat this for all on the names under the “Name” column of df1
Here is my code
import os import pandas as pd import openpyxl as opxl
def Page(Test, Report): # select columns i want to work with copy_columns = ["Name", "City", "State"] compare_column_t = ["Name"] compare_column_r = ["Name", "School"] write_columns = [“School"]
# create dataframes for the columns to copy & paste to the output file and columns to compare
data_frame = pd.DataFrame(columns=copy_columns)
df_com_tl = pd.DataFrame(columns=compare_column_t)
df_com_cng = pd.DataFrame(columns=compare_column_r)
data_frame2 = pd.DataFrame(columns=write_columns)
# copy "Name", "City", "State" columns from “Test” input file and paste them in the output file
#“Result”…. The code up until here will work for part 1
df_file = pd.read_excel(Test)
selected_columns = df_file.loc[:, copy_columns]
data_frame = pd.concat([selected_columns, data_frame], ignore_index=True)
data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1)
# part 2 is where I’m stuck… compare "Name" column from “Test” and “Report” file and print the
# “School” to the output file “Result”
df_file2 = pd.read_csv(CalbertNG_Report, encoding='cp1252')
for i in range (0, len(df_file)):
for j in range (0, len(df_file2)):
sub1 = df_file.loc[i, compare_column_t]
sub2 = df_file2.loc[j, compare_column_r]
#print(sub1,sub2)
if sub1==sub2:
data_frame2.loc[i, write_columns] = subfunc2
print(data_frame2.loc[i, write_columns])
data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1)
Any help is greatly appreciated.