Read many files and make columns in Pandas

Question

I have several output files, here are two:

File1:

4
12
13
6
.....

File2

20
3
9
14
.....

Goal Output:

I need to bulk load them into a huge dataframe. Here's my start:

(1) Create Array of all the files:

allfiles = []
for root, dirs, files in os.walk(r'/my_directory_path/'):
    for file in files:
        if file.endswith('.csv'):
            allfiles.append(file)

(2) Loading the files into pandas: (Problem is here)

big = pd.DataFrame

for i in allfiles:
    file='/my_directory_path/' + i
    big[i] = pd.read_csv(file,sep='\t',header=None)

The problem is the big[i], I need to make a new column within a for loop while passing i.

jezrael · Accepted Answer · 2016-02-14T19:38:02.743

2

You can use append and concat with parameter axis=1:

import pandas as pd
import glob

i = 1
dfs = []
#create empty df for output
d = pd.DataFrame()
#glob can use path with *.txt - see http://stackoverflow.com/a/3215392/2901002
#for i in allfiles:
for files in glob.glob('my_directory_path/*.csv'):
    #print files
    #added name as column name
    dfs.append(pd.read_csv(files, sep='\t',header=None, names = ['r_' + str(i)]))
    i += 1
p = pd.concat(dfs, axis=1)

print p


print p
   r_1  r_2
0    4   20
1   12    3
2   13    9
3    6   14

edited Feb 14 '16 at 19:38

answered Feb 14 '16 at 19:18

jezrael

822,522
95
1,334
1,252

2

This code will run much more efficiently if you put the `pd.concat` statement outside of your `for` loop. You are overwriting the same variable `p` and needlessly concatenating on each loop. – Alexander Feb 14 '16 at 19:29

Read many files and make columns in Pandas

1 Answers1