I am new here, and this is my first question. Sorry if I am doing this wrong.
I have some DataFrames, and I need to join them based on the existence of the same ID but keeping both lines. For example:
Table A:
ID | pID | Code | Condition |
---|---|---|---|
1 | 10 | 'bl' | 'Normal' |
2 | 11 | 'bl' | 'MCI' |
3 | 12 | 'bl' | 'Normal' |
4 | 15 | 'bl' | 'EMCI' |
Table B:
ID | pID | Code | Condition |
---|---|---|---|
1 | 10 | 'm06' | 'Alzheimer' |
2 | 11 | 'm06' | 'MCI' |
4 | 15 | 'm06' | 'Alzheimer' |
5 | 16 | 'm06' | 'Normal' |
Table C:
ID | pID | Code | Condition |
---|---|---|---|
1 | 10 | 'm12' | 'Alzheimer' |
2 | 11 | 'm12' | 'MCI' |
4 | 18 | 'm12' | 'Normal' |
5 | 19 | 'm12' | 'MCI' |
The result I need with this joins the following:
Result:
ID | pID | Code | Condition |
---|---|---|---|
1 | 10 | 'bl' | 'Normal' |
2 | 10 | 'm06' | 'Alzheimer' |
3 | 10 | 'm12' | 'Alzheimer' |
4 | 11 | 'bl' | 'MCI' |
5 | 11 | 'm06' | 'MCI' |
6 | 11 | 'm12' | 'MCI' |
This is just an example, tables do not have the same size, and the field ID is the ID provided by pd.DataFrame. Is this possible to make? Can you help me?
Thank you for the time you took to read my question! Happy programming!
Edit 1:
Thanks for all your answers. @Arkadiusz answer works fine for two tables. The problem is I have multiple tables, which I need to intersect the pID.
@Arkadiusz answer: (Little modification)
Concatenate two dataframes:
out = pd.concat([df_a, df_b]).sort_values('pID')
Leave only values which are duplicated by 'pID' column:
out = out[out['pID'].duplicated(keep=False)].reset_index(drop=True)
To answer some comments, when I say tables have different size, the current dataset I am working with and the one mentioned in this problem, I am referring to the number of rows.
The field ID is purely a field created from pd.DataFrame. It can be discarded.
Edit 2:
The image 1 shows how many tables I am using. VISCODE is what I mean by code, it actually describes when was the baseline visit and visits after that. The point here is to have a dataset where a person (pID) was present on all visits. To see the evolution of the patient.
Sorry for so many edits. Again, thank you for your time reading my question!
Edit 3: Just added a third table to make my question more clear. My goal is to maximize the number of patients on a longitudinal study. For this to work, the patient must have been present to all the exams (Code).