Supposed I have 100 txt files, and every file has 20000 records, I would like to let every file have 25000 records, how to fill up data with another file to get every file with 25000 records?
Asked
Active
Viewed 142 times
0
-
Read in all the files as one pandas dataframe, then split them up by chunks of 25000 rows and write them to files again. – Erfan Jun 22 '20 at 08:52
-
Will it slow? Could you please show me some sample code, thanks so much – Elsa Jun 22 '20 at 08:56
-
It can be really fast, start [here](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) – Erfan Jun 22 '20 at 08:56
-
Mine are *txt files, and I didn't find that link contains any number, I need to change every file with 20000 records into 25000 records. – Elsa Jun 22 '20 at 09:09
1 Answers
0
when they are all in one directory, use this:
import os
import pandas as pd
path = "path/to/directory/"
dfs = [] # list of dataframes
for file in os.listdir(path):
if file.endswith(".txt"):
# edit with you separator of choice
dfs.append(pd.read_csv(file, sep=" ")
# edit with your axis of choice
# ignore axis is important so you don't have multiple indices
full_df = pd.concat(dfs, axis=0, ignore_index=True)
l = len(full_df)
n_dfs = l // 25000 + 1 # new number of dfs
for i in range(ndfs):
if i < (n_dfs - 1):
new_df = full_df[i * 25000: (i+1) * 25000]
else:
new_df = full_df[i * 25000:]
new_df.to_csv("path/to/new_df/file.txt", header=None, index=None, sep=' ', mode='a')
this should do.

Dorian
- 1,439
- 1
- 11
- 26
-
I will try it right now, thank you so much @Dorian, what if I want to change 25000 into 22000, can I write a user-defined function, to change the parameter easier in the future? – Elsa Jun 22 '20 at 09:36
-
All files are in the same directory, and this script will write all records into only one txt file – Elsa Jun 22 '20 at 09:43
-
of course you can just change 25000 to any number you would like. You obviously have to change the name of the file in the loop, so you don't overwrite the original file. – Dorian Jun 22 '20 at 18:38
-
I have several headers, need to display it, do you know how to change it? – Elsa Jun 23 '20 at 03:57
-
thats beyond the scope of the original question and it would potentially make sens to close this one and open a new one. I also don't know what you mean by changing the headers - of the columns? ordering? headers of what? – Dorian Jun 23 '20 at 09:04