1

I am a little familiar with VBA but I am not sure how to approach parsing and extracting data from a csv file, or even what coding language is best to use. (I am familiar with python as well but haven't used in a few years) My daughter knows some R so I might be able to shoot some sample code off to her but she is swamped at work and going crazy so I'd like to have something for her to look at first if we go this way.

I have hundreds of csv files that I need to extract data from and save to an Excel file as well as populate a column with part of the filename. The filename always starts with the ID I am trying to grab, and then a bunch of underscores with dates, etc. after it. e.g. Turbo123_2023-06-06_LW-234-234

The file contents contain a bunch of gobbly-gook before the actual "table" starts. The start of the table always has the following headers, but the starting row number differs for each file. The number of rows in the "table" differs too. Some tables are around 4000 records, others more than 20,000. The data columns are separated by commas and have a lot of excess spaces between columns. As far as I can tell, the last row of the csv "table" is also the last row of the csv file.

Sample csv file:

Turbo123_2023-06-06_LW-234-234.csv

ColA                ColB       ColC         ColD      ColE
Date and Time ,     Seconds,   Pressure,    Temp,     Depth (ft)                              
5/31/2022 12:00 ,   0,         12.582,      21.998,   29.051
5/31/2022 12:30 ,   1800,      12.569,      22.103,   29.022
5/31/2022 12:00 ,   3600,      12.565,      24.1,     29.013

Is it possible to write a script that goes through each of the files in a folder, grabs the text before the first underscore (Turbo123) as a variable and then finds the cell in Column A with the text "Date and Time" and selects all rows in ColA and ColE to the end of the rows and outputs to an Excel file the following (trimmed of excess spaces):

All rows in Col A and E from the row where cell = Data and Time in ColA to the end of the row in ColE (or the end of ColA, they have the same row number)

In the Excel table I would like ColA to be popualted with the filename variable we saved (Turbo123) and the values in ColA from the csv file to be ColB in Excel and the values in ColE from the csv file to be ColumnC in Excel

eg.: the Excel File

File name = Turbo123_05312022 (time is optional)

CoumnlA    ColumnB          ColumnC 
ID_Name    Date and Time    Depth (ft)                              
Turbo123   5/31/2022 12:00  29.051
Turbo123   5/31/2022 12:30  29.022
Turbo123   5/31/2022 13:00  29.013

Any suggestions as to how to approach this in VBA or Python (or R) would be greatly appreciated.

I haven't tried anything yet as I am usure what language to use or where to start. I am not a programmer, but I see this task accomplished using code and need some, probably a lot, of direction.

Bonneroo
  • 15
  • 3

1 Answers1

0

In any language I guess you will have to learn how to do the following:

  1. given a folder, getting names of all the files in it
  2. reading the contents of a csv file
  3. adding new column (name of a file)
  4. concatenating all of the contents together
  5. saving the result into new excel file

I can show you where to look if you end up using Python 3, step-by-step:

  1. This question gives the idea how to list all the files in the directory How do I list all files of a directory?
  2. Look into pandas module, this is a powerful tool to work with table-like data, including reading tables from various sources easily. Here's the documentation on reading from csv https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
  3. After the previous step you'll have a special structure called DataFrame. Adding new column with constant value (file name) is easy https://stackoverflow.com/a/13843741/22052558
  4. There are special functions in pandas to unite dataframes, in this case you'll need concatenation, see the documentation https://pandas.pydata.org/docs/user_guide/merging.html
  5. You're almost there! When you have your final dataframe which you are ready to save into a new excel file - use to_excel() method (doc https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html). You will be asked to install an additional module for that - that's normal.

Putting together the steps above will get you the desired result.

Maria K
  • 1,491
  • 1
  • 3
  • 14