I am a little familiar with VBA but I am not sure how to approach parsing and extracting data from a csv file, or even what coding language is best to use. (I am familiar with python as well but haven't used in a few years) My daughter knows some R so I might be able to shoot some sample code off to her but she is swamped at work and going crazy so I'd like to have something for her to look at first if we go this way.
I have hundreds of csv files that I need to extract data from and save to an Excel file as well as populate a column with part of the filename. The filename always starts with the ID I am trying to grab, and then a bunch of underscores with dates, etc. after it. e.g. Turbo123_2023-06-06_LW-234-234
The file contents contain a bunch of gobbly-gook before the actual "table" starts. The start of the table always has the following headers, but the starting row number differs for each file. The number of rows in the "table" differs too. Some tables are around 4000 records, others more than 20,000. The data columns are separated by commas and have a lot of excess spaces between columns. As far as I can tell, the last row of the csv "table" is also the last row of the csv file.
Sample csv file:
Turbo123_2023-06-06_LW-234-234.csv
ColA ColB ColC ColD ColE
Date and Time , Seconds, Pressure, Temp, Depth (ft)
5/31/2022 12:00 , 0, 12.582, 21.998, 29.051
5/31/2022 12:30 , 1800, 12.569, 22.103, 29.022
5/31/2022 12:00 , 3600, 12.565, 24.1, 29.013
Is it possible to write a script that goes through each of the files in a folder, grabs the text before the first underscore (Turbo123) as a variable and then finds the cell in Column A with the text "Date and Time" and selects all rows in ColA and ColE to the end of the rows and outputs to an Excel file the following (trimmed of excess spaces):
All rows in Col A and E from the row where cell = Data and Time in ColA to the end of the row in ColE (or the end of ColA, they have the same row number)
In the Excel table I would like ColA to be popualted with the filename variable we saved (Turbo123) and the values in ColA from the csv file to be ColB in Excel and the values in ColE from the csv file to be ColumnC in Excel
eg.: the Excel File
File name = Turbo123_05312022 (time is optional)
CoumnlA ColumnB ColumnC
ID_Name Date and Time Depth (ft)
Turbo123 5/31/2022 12:00 29.051
Turbo123 5/31/2022 12:30 29.022
Turbo123 5/31/2022 13:00 29.013
Any suggestions as to how to approach this in VBA or Python (or R) would be greatly appreciated.
I haven't tried anything yet as I am usure what language to use or where to start. I am not a programmer, but I see this task accomplished using code and need some, probably a lot, of direction.