1

I have two csv files that i created with python from an unstructured data but i don't want my script to output two files once i run the script on a json. So lets say i have a file A with columns as follows:

File 1:

feats   ID     A         B     C       E
AA      123   3343      234   2342    112
BB      121   3342      237   2642    213``
CC      122   3341      232   2352    912
DD      123   3343      233   5342    12
EE      121   3345      235   2442    2112

...and so on with lets say, 10000 rows of different values and 6 columns. Now I want to check these values of column "ID" against file 2 and merge on the values of ID.

File 2:

Char_Name           ID    Cosmic Awareness            
  Uatu              123          3.4              
  Galan             121          4.5              ``
  Norrin Radd       122          1.6              
  Shalla-bal        124          0.3              
  Nova              125          1.2      

This file 2 has only 5 rows for 5 different values for b and lets say 23 column values. I can do this easily with map or apply in pandas but i'm dealing with 1000's of files and don't wanna do that. Is their any way like mapping the file 2 values (name and cosmic awareness columns) to File 1 by adding new columns titled 'name' and 'cosmic' (from file 2) by matching the values with corresponding ID values on File 1 and File 2. The expected output should be somewhat like this.

Final File:

feats   ID     A         B     C       E      Char_Name    Cosmic Awareness
AA      123   3343      234   2342    112     Uatu           3.4
BB      121   3342      237   2642    213``   Galan          4.5
CC      122   3341      232   2352    912     Norrin Radd    1.6
DD      123   3343      233   5342    12      Uatu           3.4
EE      121   3345      235   2442    2112    Galan          4.5

Thanks in advance and if their is any way to improve this question, the suggestions are welcome. I will incorporate them here. I have added the expected outcome above.

Mr. Confused
  • 245
  • 2
  • 11
  • Can you add sample of 2,3 files like 1000s and expected output? – jezrael May 31 '18 at 07:35
  • Jezrael I have added the expected output that i'm looking for. – Mr. Confused May 31 '18 at 07:36
  • Do you think [this](https://stackoverflow.com/a/30512931) ? – jezrael May 31 '18 at 07:42
  • It may work but again the problem is going the pandas way... I don't want to do that and work with formatting the csv in the initial phase itself. So any way with which we can read 2 files, do something similar to map/merge based on id will really help. – Mr. Confused May 31 '18 at 07:45

1 Answers1

1

I think need glob for all file names and then in list comprehension create DataFrame:

from functools import reduce
import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]

Last merge together:

df = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

For outer join is possible use concat:

import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp, index_col=['ID']) for fp in files]

df = pd.concat(dfs, axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252