How to deal with dataset containing multiple csv files?

Question

I'm implementig an LSTM but i have problem of dataset. My dataset is in the form of multiple CSV files(different problem instances) I have more than 100 CSV files in a directory that I want to read and load them in python. My question is how I should proceed to build a dataset for training and testing. Is there a way to split each csv file into two parts (80% training and 20% testing) then grouping the 80% of each as data for training and grouping the 20% for testing. or is there another more efficient way of doing things How do i take these multiple CSVs as input to train and tet the LSTM? this is a part of my csv file structure CSV file structure and this one a screen of my csvs files (problems instances)csvs files

score 0 · Accepted Answer · answered Jun 08 '21 at 14:51

You can use pandas pd.concat() to combine multiple dataframes with the same columns (pandas docs).

You can iterate through that directory to create a list of csv file names, read each csv using pd.read_csv(), and then concatenate into a final dataframe with something like this:

final_df=pd.DataFrame(columns=[<YOUR COLUMNS>])
for csv_path in csv_files_list:
    df=pd.read_csv(csv_path)
    final_df=pd.concat(final_df, df)

From here, you can split your training and test data using sklearn or whatever other method you like.

How to deal with dataset containing multiple csv files?

1 Answers1