How do I Merge 2 or more CSV files using pandas and replacing with "not applicable" when a value isn't found

Question

I am trying to merge two or more csv files so that

MetaData1
Sample_name   TITLE
Cody        Chicken Pox
Claudia     Chicken Pox
Alex        Chicken Pox
Steven      Chicken Pox
Mom         Chicken Pox
Dad     

MetaData2
Sample_name    TITLE       Geo_Loc    DESCRIPTION
Dad         Chicken Pox     Earth       people
Me          Chicken Pox     Earth       people
Roger       Chicken Pox     Earth       people
Ben         Chicken Pox     Earth       people

Merge together to look like this:

Merged Metadata 
Sample_name    TITLE             Geo_Loc                 DESCRIPTION
Cody        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Claudia     Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Alex        Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Steven      Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Mom         Chicken Pox   Missing:Not Applicable    Missing:Not Applicable
Dad         Chicken Pox     Earth                   people
Me          Chicken Pox     Earth                   people
Roger       Chicken Pox     Earth                   people
Ben         Chicken Pox     Earth                   people

The code I have so far is Below: However its only stitching the two csv files together not really amending and over laying. as the images above.

import pandas as panda
import numpy as numpy

File_one = panda.read_csv('/Users/c1carpenter/Desktop/Test.txt', sep='\t', header=0, dtype=str)
File_two = panda.read_csv('/Users/c1carpenter/Desktop/Test2.txt', sep='\t', header=0, dtype=str)
Concat_File = panda.concat([File_one, File_two])

for column_header in Final_File:
    for entry in Final_File[column_header]:
        if str(entry) == 'nan':
            print(entry)
            entry = 'not applicable'
            print("changed to: " + entry)

Concat_File.to_csv(path_or_buf='/Users/c1carpenter/Desktop/' + 'diditwork.txt', sep='\t', na_rep='not applicable',index=False)

Do not post pictures please instead try to provide a reproducible code for your datasets or at least provide text form of your datasets, look here https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Sahil Dahiya, Dec 30 '17 at 20:33

score 0 · Answer 1 · answered Dec 30 '17 at 22:32

If you assign entry = 'not applicable', than you don't modify the dataframe. Instead, you should iterate over the frame and assign using Final_File.at or Final_File.iat.

But in your case, you can just fill NaNs using:

Final_File.fillna('Missing:Not Applicable')

Please, look at great tutorial 10 Minutes to pandas.

j12y · Answer 2 · 2017-12-30T23:25:47.637

If I understand your question, it looks like you may want to merge() the results of your csv files, instead of concat() them together.

import pandas as pd
m1 = pd.read_csv('metadata1.csv')
m2 = pd.read_csv('metadata2.csv')
mm = pd.merge(m1, m2, how='outer', on='Sample_name')  

# Cleanup to merge duplicate non-index column
mm['TITLE'] = mm[['TITLE_x', 'TITLE_y']].fillna('').sum(axis=1)
mm.drop(['TITLE_x','TITLE_y'], axis=1, inplace=True)

# Replace NaN if you desire
mm.fillna('Missing:Not Applicable')

You can find out more about merge() and how to do joins here if you want to adjust the results you are getting to match your requirements better: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

Hope that helps.

Hi,This was super helpful. What if I have other duplicate non-index columns. How do i merge them without having to clean-up each one individually. — Carolina Estrella, Jan 04 '18 at 17:56

How do I Merge 2 or more CSV files using pandas and replacing with "not applicable" when a value isn't found

2 Answers2