0

Context: working with a medical database (CSV)

So I have a set of data that looks like this. This will be the 'left' file during the merge.

The 'right' file looks like this.

I need to merge the two CSVs on 'INC_KEY' (unique 9-digit number assigned to each patient), however the 'right' file may contain more than one instance of the same INC_KEY (i.e. more than one TMODE for the same patient).

How do I merge the files such that all of the TMODEs for each patient will be in the resulting merged file? The column names would be: TMODE1, TMODE2, TMODE3, etc.

  • 1
    [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – It_is_Chris Jun 29 '21 at 19:06
  • Is what I provided not good enough? I'm very new to this, the screenshots tell you everything you need to know. My apologies if it is not ideal. – Sean Roudnitsky Jun 29 '21 at 19:19
  • Rather than picture, can you extract a sample of your dataframe as text? – Corralien Jun 29 '21 at 19:24
  • Images are not sufficient. Did you read the answer to the link I posted about including a small example DataFrame as runnable code? – It_is_Chris Jun 29 '21 at 19:24

1 Answers1

0

You need to read both tables by pandas as dataframes and then merge them based on your criteria which is matching up both tables and return patients who have multiple modes in the right table:

import pandas as pd
table1 = pd.read_csv('left.csv')
table2 = pd.read_csv('right.csv')
table = pd.merge(table1,table2,on='INC_KEY',how='outer')
table.pivot(index='INC_KEY', columns='TMODE', values=['TRANS_BLOOD_24HOURS','TRANS_BLOOD_MEASURE_DESC','TRANS_BLOOD_CONV','TRANTYPE'])

To understand more about how parameter in the merge function read this please: here