0

I have a folder with mutiple csv files. The name of some of them starts with the string 'REC_' I would like to fetch all files starting with that string and append them into a single df. How can I do that?

The way I fetch just one would be

with open(path_to_my_folder, 'r') as csvfile:
    reader = csv.reader(csvfile)

This way I need to specify the exact file in the 'path_to_my_folder' variable.

banana_99
  • 591
  • 5
  • 15
  • as per answer or use `glob` - `https://www.geeksforgeeks.org/how-to-use-glob-function-to-find-files-recursively-in-python/` – Bruno Vermeulen Jul 13 '21 at 08:38
  • Does this answer your question? [Import CSV file as a pandas DataFrame](https://stackoverflow.com/questions/14365542/import-csv-file-as-a-pandas-dataframe) – mpx Jul 13 '21 at 08:39

2 Answers2

2

First, you can list all files that starts with REC_ (If some of them are not .csv then you need to check the extension as well). Then you can make a list of dataframes, each containing one REC_ file. Finally, pd.concat() can be used to concatenate the dataframes. Here axis=0 means we add them over the rows (stacking them on top of each other vertically).

REC_file_1.csv
val_1, val_2
1, 2
2, 4
REC_file_2.csv
val_1, val_2
3, 6
4, 8
import os
import pandas as pd

# All files in directory
print(os.listdir())
# ['other_file_1.csv', 'REC_file_1.csv', 'REC_file_2.csv', 'script.py']


rec_file_names = [file for file in os.listdir() if file.startswith('REC_')]
print(rec_file_names)  # ['REC_file_1.csv', 'REC_file_2.csv']

dataframes = []
for filename in rec_file_names:
    dataframes.append(pd.read_csv(filename))

data_concated = pd.concat(dataframes, axis=0)

print(data_concated)
   val_1   val_2
0      1       2
1      2       4
0      3       6
1      4       8
Ibrahim Berber
  • 842
  • 2
  • 16
2

You talk about dataframes, so I guess you are willing to use pandas. You can iterate over your csv-files easily with the build-in pathlib-module. Eventually concatenate your frames:

from pathlib import Path
import pandas as pd

path_dir = Path(path_to_my_folder)

list_dfs = []
for path_file in path_dir.glob('REC_*.csv'):
    df_small = pd.read_csv(path_file)
    list_dfs.append(df_small)
    
df = pd.concat(list_dfs, axis=0) 
Durtal
  • 1,063
  • 3
  • 11