I am using Python 3.
I have a master dataframe " df " with the columns as shown (with 3 rows of sample data):
UNITID CIPCODE AWLEVEL CTOTALT
100654 1.0999 5 9
100654 1.1001 5 10
100654 1.1001 7 6
I have a dataframe called " uni_names " as shown (with 3 rows of sample data):
UNITID institution_name
100654 Alabama A & M University
100663 University of Alabama at Birmingham
100690 Amridge University
I have a dataframe called " cipcodes " as shown (with 3 rows of sample data):
cipcode_value program_name
01.0000 Agriculture, General
01.0101 Agricultural Business and Management, General
01.0102 Agribusiness/Agricultural Business Operations
I have a dataframe called " awlevel " as shown (with 3 rows of sample data):
code type
3 Associate's degree
5 Bachelor's degree
7 Master's degree
What I want is an output dataframe with column names as such
institution_name program_name type CTOTALT
My code below is giving duplicates and weird additional values:
import pandas as pd
# Read the master dataframe from a CSV file
df = pd.read_csv('master_data.csv')
# Read the uni_names dataframe from a CSV file
uni_names = pd.read_csv('uni_names.csv')
# Read the cipcodes dataframe from a CSV file
cipcodes = pd.read_csv('cipcodes.csv')
# Read the awlevel dataframe from a CSV file
awlevel = pd.read_csv('awlevel.csv')
# Merge df with uni_names based on UNITID
merged_df = df.merge(uni_names, on='UNITID')
# Merge merged_df with cipcodes based on CIPCODE
merged_df = merged_df.merge(cipcodes, left_on='CIPCODE', right_on='cipcode_value')
# Merge merged_df with awlevel based on AWLEVEL
merged_df = merged_df.merge(awlevel, left_on='AWLEVEL', right_on='code')
# Select the desired columns and assign new column names
output_df = merged_df[['institution_name', 'program_name', 'type', 'CTOTALT']]
output_df.columns = ['institution_name', 'program_name', 'type', 'CTOTALT']
# Print the output dataframe
print(output_df)