1

i have 10 text files in a directory and each text files has random text data(no tabular form). im trying to create a dataframe out of it and each text file data should should be along rows but not columns.

i tried the code as in below image, but the datas are loading in multiple columns but not in rows. tough i specifies axis parameter in read.csv method still no luck. can anyone help me this.

filelist = glob.glob('D:/Annaconda/Project/aclImdb_v1/aclImdb/test/neg1/*.txt') 

df_list = [pd.read_csv(file) for file in filelist]

neg_df = pd.concat(df_list, axis=1, sort=False)

test=pd.DataFrame(neg_df)

test_df['label']=0

test_df.head()

Expected: all file data's should be append in rows.

actual: all file data's are appending in 10 columns.

anky
  • 74,114
  • 11
  • 41
  • 70
vivek
  • 11
  • 1

1 Answers1

0

Here are two other approaches without loops

Raw data files

d1.csv

a  b  c
1  3  5
2  4  6

d2.csv

a  b   c
5  8   5
6  4  22

d3.csv

a   b   c
15   8   7
10  85  22

Pandas based

import pandas as pd
filelist = ['d1.csv', 'd2.csv','d3.csv']
test = pd.concat(map(pd.read_csv, filelist)).reset_index(drop=True)
print(test)
    a   b   c
0   1   3   5
1   2   4   6
2   5   8   5
3   6   4  22
4  15   8   7
5  10  85  22

Using Dask (installation)

import dask.dataframe as dd
ddf = dd.read_csv('d*.csv')
test = ddf.compute().reset_index(drop=True)
print(test)
    a   b   c
0   1   3   5
1   2   4   6
2   5   8   5
3   6   4  22
4  15   8   7
5  10  85  22

Sources

  1. SO post for Pandas
  2. SO post for Dask
edesz
  • 11,756
  • 22
  • 75
  • 123