I'm implementig an LSTM but i have problem of dataset. My dataset is in the form of multiple CSV files(different problem instances) I have more than 100 CSV files in a directory that I want to read and load them in python. My question is how I should proceed to build a dataset for training and testing. Is there a way to split each csv file into two parts (80% training and 20% testing) then grouping the 80% of each as data for training and grouping the 20% for testing. or is there another more efficient way of doing things How do i take these multiple CSVs as input to train and tet the LSTM? this is a part of my csv file structure CSV file structure and this one a screen of my csvs files (problems instances)csvs files
Asked
Active
Viewed 358 times
1 Answers
0
You can use pandas pd.concat()
to combine multiple dataframes with the same columns (pandas docs).
You can iterate through that directory to create a list of csv file names, read each csv using pd.read_csv()
, and then concatenate into a final dataframe with something like this:
final_df=pd.DataFrame(columns=[<YOUR COLUMNS>])
for csv_path in csv_files_list:
df=pd.read_csv(csv_path)
final_df=pd.concat(final_df, df)
From here, you can split your training and test data using sklearn or whatever other method you like.

Jared Stock
- 190
- 1
- 16