How to combine a large number of dataframes?

Question

I have many .txt files in a folder. For example, each .txt file is like below.

FileA = pd.DataFrame({'Id':["a","b","c"],'Id2':["a","b","z"],'Amount':[10, 30,50]})
FileB= pd.DataFrame({'Id':["d","e","f","z"],'Id2':["g","h","i","j"],'Amount':[10, 30,50,100]})
FileC= pd.DataFrame({'Id':["r","e"],'Id2':["o","i"],'Amount':[6,33]})
FileD...

I want to extract the first row of each dataframe in the folder, and then combine all of them. So what I did is below.

To make a list of the txt files, I did the following.

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

To extract first row and combine all of them, I did below.

pd.read_table(txtfiles[0])[:1].append([pd.read_table(txtfiles[1])[:1],pd.read_table(txtfiles[2])[:1]],pd.read_table.......)

If the number of txt. files is small, I can do in this way, but in case there are many .txt files, I need an automation method. Does anyone know how to automate this? Thanks for your help!

Your question is a bit unclear, do you want to know if the txt. contains large amount of data how to load it to data frame, or how to parse all of those files inside the folder? — Stefan, Jan 14 '20 at 00:53
There are about 1,000 files in a folder like FileA, FileB, FileC.... I want to extract the first row of each file, and then combine all of these first rows (FileA's first row+FileB's first row+File C's first row....) — Tom_Hanks, Jan 14 '20 at 01:18

datapug · Accepted Answer · 2020-01-14T01:19:57.653

2

Based on Sid's answer to this post:

input_path = r"insert/your/path" # use the patk where you stored the txt files
all_files = glob.glob(os.path.join(input_path, "*.txt"))     
df_from_each_file = (pd.read_csv(f, nrows=1) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

Update Using pd.read_csv was not properly ingesting the file. Replacing read_csv with read_table should give the expected results

input_path = r"insert/your/path" # use the patk where you stored the txt files
all_files = glob.glob(os.path.join(input_path, "*.txt"))     
df_from_each_file = (pd.read_table(f, nrows=1) for f in all_files)
concatenated_df = pd.concat(df_from_each_file, ignore_index=True)

edited Jan 14 '20 at 01:19

answered Jan 14 '20 at 00:56

datapug

2,261
1
17
33

1

Unfortunately, this method resulted in the concatenation of each columns. For example, I got a data like aa10, dg10, ro6, in case of the above dataframes. I want to keep the values separated. – Tom_Hanks Jan 14 '20 at 01:14
It is likely due to the fact that the separator in your file is not a comma. I replace the `read_csv` method with the one you used when posting the question: `read_table`. Does that give the expected results on your end? I updated the answer's accordingly to your feedback (: – datapug Jan 14 '20 at 01:21

How to combine a large number of dataframes?

1 Answers1