1

I have a data frame as follows:

ID    Date   ColA1   ColB1   ColA2   ColB2   ColA3   ColB3 
id1   date1   1        2      3        4       5       6      
id2   date2   7        8      9        10      11      12

How can I split the columns ColA2, ColB2, ColA3, ColB3 and merge them again to the dataset as rows (considering the ID and Date columns)?

Expected output:

ID   Date     ColA  ColB
id1  date1    1      2
id1  date1    3      4
id1  date1    5      6
id2  date2    7      8
id2  date2    9      10
id2  date2    11     12

Thank you!

Annie
  • 79
  • 5

1 Answers1

4

you need wide_to_long

pd.wide_to_long(df.reset_index(), stubnames = ['ColA', 'ColB'], i = 'index', j = 'value').reset_index(drop = True)

    ColA    ColB
0   1   2
1   7   8
2   3   4
3   9   10
4   5   6
5   11  12

Edit
Output without reset_index()

pd.wide_to_long(df.reset_index(), stubnames = ['ColA', 'ColB'], i = 'index', j = 'value')
            ColA    ColB
index   value       
0       1   1       2
1       1   7       8
0       2   3       4
1       2   9       10
0       3   5       6
1      3    11      12

Edit2
With the new data sample provide by OP:

pd.wide_to_long(df, stubnames = ['ColA', 'ColB'], i = ['ID', 'Date'], j = 'value').reset_index([0,1])

    ID  Date    ColA    ColB
value               
1   id1 date1   1   2
2   id1 date1   3   4
3   id1 date1   5   6
1   id2 date2   7   8
2   id2 date2   9   10
3   id2 date2   11  12
Terry
  • 2,761
  • 2
  • 14
  • 28
  • Just trying to understand, you had to use ```reset_index``` to create an index column for your 'i' value since you couldnt select ColA or ColB, and your j value is just an arbitrary name for whats after the stub names? – Ben Pap May 01 '19 at 22:58
  • Hi Terry, thank you for your help. I really appreciate it. I just updated the original question, adding the `ID` and `Date` columns, which actually made the problem a bit different. Could you please take a look and see if you could help? Thank you! – Annie May 01 '19 at 23:01
  • @BenPap Yes, That's right, you made the correct reading, I add the output without the `reset_index()` to facilitate the visualization of the wholeprocess – Terry May 01 '19 at 23:03
  • @Annie Can you confirm if my output is correct? in your expected output have `date1` in the two last rows. In my have `date2` – Terry May 01 '19 at 23:19
  • Hi Terry, sorry for the typo, the expected output should have `date2` in the last 3 rows. However, as I run the code on my dataset (a more complex version of the one in this question: 4 columns in the stubnames, and 3 columns to use as the id_variable), and I got an error saying that `the id variables need to uniquely identify each row.`. May I know how to fix this error? Thank you for your time, I really appreciate it. – Annie May 01 '19 at 23:25
  • @Annie I believe [here](https://stackoverflow.com/a/50856264/5847854) have what you need – Terry May 01 '19 at 23:55
  • Hi Terry, thank you for sharing with me the reference. I already tried that but it still didn't work for my case. I still got the same error saying that `the id variables need to uniquely identify each row.` Do you know if there is any other way for me to fix it? Thank you! – Annie May 02 '19 at 00:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192694/discussion-between-terry-and-annie). – Terry May 02 '19 at 00:17