I have two data frames df1 and df2. In df1 I have 50 columns and in df2 I have 50+ columns. What I want to achieve is that In df1 I have 13000 rows and a column name subject where names of all subjects are given. In df2 I have 250 rows and along 50+ I have two columns named subject code and subject_name.
Here is an example of my datasets:
df1 =
index subjects
0 Biology
1 Physicss
2 Chemistry
3 Biology
4 Physics
5 Physics
6 Biolgy
df2 =
index subject_name subject_code
0 Biology BIO
1 Physics PHY
2 Chemistry CHE
3 Medical MED
4 Programming PRO
5 Maths MAT
6 Literature LIT
My desired output in df1 (after replacing subject_name and fixing the spelling errors) is:
index subjects subject_code
0 Biology BIO
1 Physics PHY
2 Chemistry CHE
3 Biology BIO
4 Physics PHY
5 Physics PHY
6 Biology BIO
What happens at my end is that I want to merge all subject values in df1 with values in df2 subject name values. In df1 there are around 500 rows where I get NAN after I merge both the columns in one as in these 500 rows there is some difference in the spellings of the subject. I have tried solution given at the following links but didn't work for me: replace df index values with values from a list but ignore empty strings
Python pandas: replace values multiple columns matching multiple columns from another dataframe
Here is my code:
df_merged = pd.merge(df1_subject,df2_subjectname, left_on='subjects', right_on='subject_name')
df_merged.head()
Can anyone tell me how I can fix this issue as I have already spend 8 hours on this issue but am unable tor resolve it.
Cheers