I'm trying to gather multiple csv files from one folder into a dataframe. With this prior question we realized the real issue is that some csv files (summary files) contain more than one table. As a result, the current solution's product (code below) skips a significant portion of the data.
Is there any reasonable way to gather multiple files, each possibly containing multiple tables?
Alternatively, if this makes it easier, I have, and could use, separate text files for each of the tables contained in the larger summary files.
Anyhow, what I seek, is that a single row of the generated dataframe should contain the data from the three separate text files / three tables inside the summary file.
Here is my code for just adding the text files from their folder.
import pandas as pd
import os
import glob
#define path to dir containing the summary text files
files_folder = "/data/TB/WA_dirty_prep_reports/"
#create a df list using list comprehension
files = [pd.read_csv(file, sep='\t', on_bad_lines='skip') for file in glob.glob(os.path.join(files_folder,"*txt"))]
#concatanate the list of df's into one df
files_df = pd.concat(files)
print(files_df)