Merge data using identifier across columns

Question

I have a column identifier & a value (associated with that identifier). Both Identifier and the associated value move the another set of columns. I want to create a data stream that brings all data belonging to a single identifier and its associated value into 2 new columns!

I have tried pandas melt, and few other methods. None seem to work.

Kindly help me how to do this.

Input data:

Output data needed:

Data is here: https://docs.google.com/spreadsheets/d/1NU0ZXL6vKq3rAR18_DJp3_DxwYV8KFtE/edit?usp=drive_link&ouid=103726583001270325575&rtpof=true&sd=true

please provide your input and output in a reproducible format (**text** or **code**) — mozway, Aug 29 '23 at 08:54
Hi @mozway - Here you go: https://docs.google.com/spreadsheets/d/1NU0ZXL6vKq3rAR18_DJp3_DxwYV8KFtE/edit?usp=drive_link&ouid=103726583001270325575&rtpof=true&sd=true — Night Cheetah, Aug 29 '23 at 08:57
please provide your input and output in a reproducible format, and not using an external link. — Luuk, Aug 29 '23 at 08:58
@NightCheetah a link is not needed here and not stable in time, please provide directly the content of the input [in a way that takes only a few seconds for others to replpicate the data](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — mozway, Aug 29 '23 at 09:01

mozway · Answer 1 · 2023-08-29T09:11:52.223

Assuming this input:

   Date  X1 Contract1  X2 Contract2   X3 Contract3
0     1   1         B  10         B  100         A
1     2   2         B  20         B  200         A
2     3   3         C  30         A  300         B
3     4   4         A  40         C  400         B
4     5   5         A  50         C  500         C

And that you want to select the contract "A", use pandas.wide_to_long and filter the reshaped DataFrame:

out = (pd.wide_to_long(df, i='Date', j='id', stubnames=['X', 'Contract'])
         .query('Contract == "A"')
         .reset_index('Date')
      )

Output:

    Date    X Contract
id                    
1      4    4        A
1      5    5        A
2      3   30        A
3      1  100        A
3      2  200        A

And to keep the dates in order:

out = (pd.wide_to_long(df, i='Date', j='id', stubnames=['X', 'Contract'])
         .query('Contract == "A"')
         .sort_index().reset_index('Date')
      )

Output:

    Date    X Contract
id                    
3      1  100        A
3      2  200        A
2      3   30        A
1      4    4        A
1      5    5        A

Alternative with lreshape that will require you to manually provide the column names:

target = 'A'
out = (pd.lreshape(df, {'X': ['X1', 'X2', 'X3'],
                        'ContractID': ['Contract1', 'Contract2', 'Contract3']})
         .query('ContractID == @target')
         .sort_values(by='Date')
      )

Output:

    Date    X ContractID
10     1  100          A
11     2  200          A
7      3   30          A
3      4    4          A
4      5    5          A

Merge data using identifier across columns

1 Answers1