1

I tried to read multiple text files from a local directory into one single pandas dataframe. Since original text files come with extra file extension I renamed it, after all, then I tried to read all text files into single dataframe by read_csv and concat from pandas. Problem is, I am able to read single text files with pandas but when I tried to read a list of text files from a local directory into single dataframe, I got following error:

folder = 'fakeNewsDatasets[Rada]/fakeNewsDataset/fake'
allfiles=os.listdir(folder)
print(allfiles)

['biz01.txt',
 'biz02.txt',
 'biz03.txt',
 'biz04.txt',
 'biz05.txt',
 'biz06.txt']

then I tried to read those text files into single dataframe as follows:

dfs=pd.concat([pd.read_csv(file, header = None, sep = '\n', skip_blank_lines = True) for file in allfiles], axis=1)

*

FileNotFoundError: [Errno 2] File b'biz02.txt' does not exist: b'biz02.txt' *

I don't understand why this problem occurred because reading a single text file to pandas dataframe works well for me.

df = pd.read_csv('biz01.txt', header = None, sep = '\n', skip_blank_lines = True)
df=df.T
df.columns = ['headline', 'text']

can anyone help me to resolve this issue? how can I fix this error? any better idea?

anky
  • 74,114
  • 11
  • 41
  • 70
beyond_inifinity
  • 443
  • 13
  • 29

2 Answers2

1

use glob() it would be easier:

import glob
allfiles=glob.glob('C:\\folder1\\*.csv')

Else you may have to join the path with file while doing for file in allfiles when reading the file in pd.read_csv()

anky
  • 74,114
  • 11
  • 41
  • 70
  • I found one error, if text files are located in nested folder and I tried your code, it return empty list, such as my path something like this : `C:\Users\me\PycharmProjects\myProj\source\data\fakeNewsDataset\fake`? why and solution? – beyond_inifinity Feb 23 '19 at 19:02
  • @anjy_91 how can I make your solution work if files in subfolders? – beyond_inifinity Feb 23 '19 at 19:28
  • 1
    You Mean this? https://stackoverflow.com/questions/15580716/python-reading-files-from-directory-file-not-found-in-subdirectory-which-is-t if not that should again be a fresh question since here i didn't find you have mentioned about subfolders – anky Feb 23 '19 at 19:47
1

Another option:

import os import pandas as pd

data_set = pd.DataFrame() for root, dirs, files in os.walk(""): for file in files: if file.endswith('.txt'): df = pd.read_csv(root + "/" + file, header=None) data_set = pd.concat([data_set, df]) data_set.to_csv("/tx.txt", index=False, header=False)

Loku
  • 201
  • 2
  • 5