3

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

Requirements and Background information:

  1. Each file has one column of numbers
  2. No headers are present in the files
  3. Positive and negative integers are possible
  4. The size of all the *.txt files is the same
  5. The columns of the DataFrame must have the name of file (without extension) as the header
  6. The number of files is not known ahead of time

Here is one sample *.txt file. All the others have the same format.

16
54
-314
1
15
4
153
86
4
64
373
3
434
31
93
53
873
43
11
533
46

Here is my attempt:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

edesz
  • 11,756
  • 22
  • 75
  • 123
  • 1
    instead of this `frame = pd.read_csv(filesList[ijk])` use this `frame = pd.read_csv(ijk)` in your for loop – JAbr Apr 03 '19 at 14:08

2 Answers2

8

You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
  • Sorry I forgot to mention: there are many files maybe >20. I would strongly prefer to avoid reading them in manually. Also, I do not understand this part: "names=[item[:-4]". What is the significance of -4? – edesz Oct 17 '14 at 00:39
  • 1
    You can use `os.listdir(PATH)` to get a list of all files in the `PATH`, so that part is easy. As for, `names=item[:-4]`: the files end with `'.txt'`, and you don't want `'.txt'` to be a part of your column name, right? – CT Zhu Oct 17 '14 at 01:19
  • Thanks. I tried this approach: Line 1 - df = pd.DataFrame() Line 2 - for item in filesList: Line 3 - df = pd.concat(pd.read_csv(item, names=[item[:-4]]), axis = 1). But it is giving an error message: "TypeError: first argument must be a list-like of pandas objects, you passed an object of type "DataFrame". Is there some reason why this approach does not work? – edesz Oct 17 '14 at 01:54
  • CT Zhu's code is working but I do not understand why my approach in the comment above is not working. his method used list comprehension. I just used a simple for loop. Could you please let me know why my approach will not work? – edesz Oct 24 '14 at 13:12
  • Thank you! Note that for my case, I wanted to stack these dataframes by concatenating rows (instead of columns), so I replaced `axis=1` with `axis=0, ignore_index=True` – mikey Jan 21 '21 at 13:33
3

You're very close. ijk is the filename already, you don't need to access the list:

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(ijk)
    df = df.append(frame)
print df

In the future, please provide working code exactly as is. You import from pandas import * yet then refer to pandas as pd, implying the import import pandas as pd.

You also want to be careful with variable names. files is actually a single file path, and filelist and filesList have no discernible difference from the variable name. It also seems like a bad idea to keep personal documents in your python directory.

Phil Dukhov
  • 67,741
  • 15
  • 184
  • 220
Kracit
  • 1,660
  • 1
  • 10
  • 11
  • Sorry about the confusion with the Pandas command - yes, that should be corrected. I have updated the Original post. – edesz Oct 20 '14 at 00:31