0

I have one excel file and one csv file that I will be using. Both file paths will be passed into a function (excel file path = “list” and csv file path = “page”)

I want to create 2 dataframes, one for each file. df1 for the excel file and df2 for the csv file.

This is the information on the excel file (df1) that I will be using |Name|City|State| |:----:|:----:|:----:| |Bobby|Washington|DC| |James|Philadelphia|PA| |Walter|Greensboro|NC| |Pete|College Park|MD| |Sunny|Arlington|VA|

This is the information on the csv file (df2) I will be using (will only use “Name” and “School” columns |One|Name|Two|Three|School|Four|Five| |:-:|:--:|:-:|:---:|:-----|:--:|:--:| |q|Sunny|a|z|Eliot|p|l| |w|Pete|s|x|Maury|o|k| |e|Bobby|g|c|Woodson|i|j| |r|James|f|v|Eastern|u|h| |t|Walter|h|b|Ballou|y|h| |u|Marcus|j|n|Wilson|t|g| |y|Daniel|k|m|Dunbar|r|f|

• 1st - I want to read df1 and copy the “Name”, “City”, and “State” columns from df1 and paste them to another workbook “Results”. I also want to add a 4th column "School"

  • Results workbook should look like this ("School" column should not be populated yet)
Name City State School
Bobby Washington DC
James Philadelphia PA
Walter Greensboro NC
Pete College Park MD
Sunny Arlington VA

• 2nd – starting with the 1st row of df1, I want to compare every cell 1 by 1 under the "Name" column of df1 (sub1 = temp storage for the information of the current cell of df1), to every cell under the "Name" column of df2 starting from the 1st row (sub2 = temp storage for the information of the current cell of df2) until the sub1 == sub2.

  • If sub1 == sub2, then I want to copy the cell under the “School” column of df2 for that row and paste it to the “School” column of the workbook “Results” starting with the 1st row.
  • If sub1 != sub2, I want to increase the row of df2 and compare sub1 to the new cell for sub2 until sub1 == sub2
  • I want to repeat this for all on the names under the “Name” column of df1

Here is my code

import os import pandas as pd import openpyxl as opxl

def Page(Test, Report): # select columns i want to work with copy_columns = ["Name", "City", "State"] compare_column_t = ["Name"] compare_column_r = ["Name", "School"] write_columns = [“School"]

 # create dataframes for the columns to copy & paste to the output file and columns to compare
 data_frame = pd.DataFrame(columns=copy_columns)
 df_com_tl = pd.DataFrame(columns=compare_column_t)
 df_com_cng = pd.DataFrame(columns=compare_column_r)
 data_frame2 = pd.DataFrame(columns=write_columns)

 # copy "Name", "City", "State" columns from “Test” input file and paste them in the output file   
 #“Result”…. The code up until here will work for part 1
df_file = pd.read_excel(Test)
selected_columns = df_file.loc[:, copy_columns]
data_frame = pd.concat([selected_columns, data_frame], ignore_index=True)
data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1) 

# part 2 is where I’m stuck… compare "Name" column from “Test” and “Report” file and print the 
# “School” to the output file “Result”
df_file2 = pd.read_csv(CalbertNG_Report, encoding='cp1252')
     for i in range (0, len(df_file)):
           for j in  range (0, len(df_file2)):
                 sub1 = df_file.loc[i, compare_column_t]
                 sub2 = df_file2.loc[j, compare_column_r]
                 #print(sub1,sub2)
                 if sub1==sub2:
                    data_frame2.loc[i, write_columns] = subfunc2
                    print(data_frame2.loc[i, write_columns])
  data_frame.to_excel(writer, sheet_name="Test List", header=False, index=False, startrow=1) 

Any help is greatly appreciated.

0 Answers0