2

I have several csv files one for blood pressure for patients and others for heart rate, WBc,etc for the same patients and the same hour measure , please see the following example

First csv:

    subject_id     hour_id    value         label
    
    1                 1           96        blood pressure
    1                 1           94        blood pressure

    1                 1           93        blood pressure

    2                 2           99        blood pressure

Second csv:

    subject_id     hour_id    value        label
     
    1                 1           80        Heart rate
    
    2                 2           89        Heart rate
    2                 2           81        Heart rate

third csv:

    subject_id     hour_id    value        label
     
    1                 1           1        WBC
    
    2                 2           10       WBC
    2                 2           12       WBC

Fourth csv:

    subject_id     hour_id    value        label
     
    1                 1         123        glucose
    
    2                 2        111           glucose
    2                 2        113           glucose

Desired output:

    subject_id     hour_id     blood_pressure    heart rate    WBC     gloucose
    
    1                 1           96               80           1         123
    
    2                 2           99               89           10         120    

I tried:

df = pd.read_csv('D:\\....', low_memory=False, error_bad_lines=False)
df2 = pd.read_csv('D:\\Users', low_memory=False, error_bad_lines=False)
merged = pd.concat([df, df2,df3,df4], axis=1, keys=['subject_id', 'hour_mesaure'])
print(merged)

But it gives me:

  subject_id     hour_id        blood_pressure     
    
    1                 1              96           
         
    2                 2               99   


   subject_id     hour_id    value        label
     
    1                 1           80        Heart rate
    
    2                 2           89        Heart rate

and complete the files sequentially

any help will be appreciated

3 Answers3

1

You basically need to pivot the data after the concat.

you can proceed like this:

merged = pd.concat([df, df2,df3,df4])

after this you need to pivot the data:

merged.pivot(index = ['subject_id', 'hour_id'], columns = ['label'], values = ['value'])
Senpaivg
  • 21
  • 1
  • 3
1

Without the usage of any external lib.
The idea is to collect the data into a dict and iterate over the dict and create the output.
1.csv & 2.csv contain the BP & HR data.
Extend the list [1, 2] in order to add more input files.

The output is comma separated but you can change it if you feel like.

from collections import defaultdict

data = defaultdict(list)

for x in [1, 2]:
    with open(f'{x}.csv') as f:
        lines = [l.strip() for l in f.readlines() if l.strip()]
        for idx, line in enumerate(lines):
            if idx > 0:
                parts = line.split()
                data[(parts[0], parts[1])].append((parts[2], parts[3]))

with open('merged.csv','w') as f:
    for idx, (k, v) in enumerate(data.items()):
        if idx == 0:
            headers = ['subject_id', 'hour_id']
            headers.extend(x[1] for x in v)
            f.write(','.join(headers) + '\n')
        fields = [k[0], k[1]]
        fields.extend(x[0] for x in v)
        f.write(','.join(fields) + '\n')

output

subject_id,hour_id,blood,Heart
1,1,96,80
2,2,99,89
balderman
  • 22,927
  • 7
  • 34
  • 52
0

You can try this:

import pandas as pd


df1 = pd.read_csv('1.csv')
df2 = pd.read_csv('2.csv')
df3 = pd.read_csv('3.csv')
df4 = pd.read_csv('4.csv')

dfs = [df1, df2, df3, df4]

df = pd.concat(dfs)
df = df.pivot(index=['subject_id', 'hour_id'], columns='label', values='value').reset_index().rename_axis(index=None, columns=None)

print(df)

Output:

   subject_id  hour_id  Heart rate  WBC  blood pressure  glucose
0           1        1          80    1              96      123
1           2        2          89   10              99      120

Online Live Demo Link: https://replit.com/@tssovi/test#main.py

Sabil
  • 3,750
  • 1
  • 5
  • 16