How would I extract values out of multiple csv files that are in different subdirectories into a new csv file?

Question

I'm new to writing scripts. Any help is appreciated!

I'm trying to pull values from each of my subjects. Each subject has their own directory. In their directory is a csv file with blood pressure values that I want to pull and save into a new csv file.

The csv is set up like this

    1     2     3     4   
    3.5   4.0   3.0   5.0

I want the script to find the numbers "1" "3" and "4" and copy the values associated with them and save it to a new csv file in my working directory.

I found a script that does something similar:

    awk -F "\"*,\"*" '{print $2}' textfile.csv

but how do I get it to find the directory that the csv file is in?

I would like to run this for multiple subjects at once, with the new csv data like this:

    SUBJECT01   3.5   3.0  4.0 
    SUBJECT02   4.0   2.0  6.0
    SUBJECT03   6.0   5.0  7.0

Thanks in advance for any help/advice.

Have you tried using OS.walk()? It will return an iterator that has a 3 part tuple. And it will iterate through all the underlying directories. So if your patient directories are all iin the same parent directory, this should allow you to traverse all of them and yank the .csv data from each one. Then you can put the copies of them all into your target location. — RockAndRoleCoder, Apr 02 '19 at 22:15

score 0 · Accepted Answer · answered Apr 03 '19 at 15:43

So I have not used awk before, so I'm saving my .csv's into a dataframe using Pandas. In this script I create a list of filename + dataframe pairs (where the dataframe holds a specific patient's record)

I get the information from the subdirectories using OS module's walk():

import os
import pandas as pd

dfList = [] # holds the file name and the dataframe with its info
for a,b,c in os.walk(os.getcwd()): #creates an iterator that holds a 3 peice tuple where 'a' is path and c[0] is the file name
    if c[0].endswith('.csv'):
        dfList.append((os.path.basename(c[0]), pd.read_csv(a + '\\' + c[0])))

Now you can create your summary report based off new dfList. I'll leave those details up to you.

Then to save your report you can use pandas to csv:

finalDf.to_csv("FinalReport.csv")

here is a link on how to combine a list of dataframes: https://stackoverflow.com/questions/38089010/merge-a-list-of-pandas-dataframes — RockAndRoleCoder, Apr 03 '19 at 15:59
If you like the answer, please accept it by checking it it. Thanks! — RockAndRoleCoder, Apr 05 '19 at 04:57

How would I extract values out of multiple csv files that are in different subdirectories into a new csv file?

1 Answers1