4

I have numerous separate instrument files of X, Y (integer) column data. All arrays are the same dimension. The X column is the same for each file, the Y column numbers are different. If possible, I want to concatenate the Y columns of the successive files to the first file and write a new single large array containing the first X and multiple Ys? Like this:

file1=X1 Y1 file2=X1 Y2 file3=X1 Y3... new file result should be: X1 Y1 Y2 Y3...

Been looking at variations on:

import pandas
data = pandas.read_csv('file1.csv') 
# print(data) returns the 1st file array ok

Need to open and loop over successive files to join Y column to file1.

stuckoverflow
  • 625
  • 2
  • 7
  • 23
numpystack
  • 43
  • 1
  • 3

1 Answers1

4

you can do something like this:

import os
import glob
import pandas as pd

def get_merged_csv(flist, **kwargs):
    return pd.concat([pd.read_csv(f, **kwargs).set_index('X') for f in flist], axis=1).reset_index()

path = 'C:/Users/csvfiles'
fmask = os.path.join(path, '*mask*.csv')

df = get_merged_csv(glob.glob(fmask))

in order to name your Y columns like Y1, Y2, etc.:

cols = ['{0[0]}{0[1]}'.format(t) for t in zip(df.columns[1:], range(1, len(df.columns)))]
df.columns = df.columns.tolist()[:1] + cols

Test data:

a.csv:

X,Y
1,11
2,12
3,13

b.csv:

X,Y
1,21
2,22
3,23

c.csv:

X,Y
1,31
2,32
3,33

Test:

In [215]: df = get_merged_csv(glob.glob(fmask))

In [216]: df
Out[216]:
   X   Y   Y   Y
0  1  11  21  31
1  2  12  22  32
2  3  13  23  33

In [217]: cols = ['{0[0]}{0[1]}'.format(t) for t in zip(df.columns[1:], range(1, len(df.columns)))]

In [218]: cols
Out[218]: ['Y1', 'Y2', 'Y3']

In [219]: df.columns = df.columns.tolist()[:1] + cols

In [220]: df
Out[220]:
   X  Y1  Y2  Y3
0  1  11  21  31
1  2  12  22  32
2  3  13  23  33
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Hey MaxU, this works great!!! I used it to build a huge dataframe in seconds. As I am fairly new to Python, do you have a recommendation on writing out the results of "df?" Can I also pass df to MatPlotLib for plotting? – numpystack May 07 '16 at 14:22
  • @numpystack, about writing out results - if you mean performance you may want to read [this answer](http://stackoverflow.com/questions/37010212/what-is-the-fastest-way-to-upload-a-big-csv-file-in-notebook-to-work-with-python/37012035#37012035). In regards to Matplotlib - you would have to specify what and how do you want to plot – MaxU - stand with Ukraine May 07 '16 at 14:36