How to fix the "the id variables need to uniquely identify each row" problem when using wide_to_long?

Question

This is the follow-up post of this one

Please find the MCVE here

My data frame looks similar as the following one:

ID    Date   ColA1   ColB1   ColA2   ColB2   ColA3   ColB3 
id1   date1   1        2      3        4       5       6      
id2   date2   7        8      9        10      11      12

I want to split the columns ColA2, ColB2, ColA3, ColB3 and merge them again to the dataset as rows (and still keeping the ID and Date)

Expected output:

ID   Date     ColA  ColB
id1  date1    1      2
id1  date1    3      4
id1  date1    5      6
id2  date2    7      8
id2  date2    9      10
id2  date2    11     12

I tried both pd.wide_to_long(df, stubnames = ['ColA', 'ColB'], i = ['ID', 'Date'], j = 'value').reset_index([0,1])

and

pd.wide_to_long(df.reset_index(), stubnames = ['ColA', 'ColB'], i = ['ID', 'Date'], j = 'value').reset_index(drop=True)

But still got the same error, saying that the id variables need to uniquely identify each row.

I also already tried the method here with reset_index() and I still got the same problem.

I guess this problem came from my specific dataset. But I don't know what's wrong with it and how to figure it out, in order to solve this issue. Do you have any suggestion?

Please let me know what I can do to solve this issue. Thank you!

I can't reproduce the issue. The first code works fine. Please, provide a [Minimal, Complete, and Verifiable](https://stackoverflow.com/help/mcve) example. — Georgy, May 02 '19 at 08:11
Hi Georgy, thank you for the suggestion. I already added the link for the MCVE to the question. You can also find it [here](https://drive.google.com/file/d/1FO8Lh-IOIW594kJsxOPgqdi4ZqA_x6OO/view?usp=sharing). I look forward to hearing more from you. Thank you! — Annie, May 02 '19 at 08:18
Unfortunately, that is not an MCVE. You should provide some minimal example of input data directly in the body of the quesion, not by a link that can rot after some time. The data in the CSV contains 1000 lines, that is not minimal! And also, your code examples operate on column names like `ColA`, `ColB`, ... But I don't see those columns in the CSV. — Georgy, May 02 '19 at 08:30
Hi Georgy. I'm still figuring it out how to add the input data directly to the question. but I will add it soon. I also shortened the data in the CSV link to 10 lines and I hope that you could help me with that one in the meantime. I was trying to simplify my questions by naming the columns as `ColA`, `ColB`. I'm sorry for the confusion. I hope you understand and could help me with this. Thank you! — Annie, May 02 '19 at 08:49
Which columns correspond to the `ID`, `Date`, `ColA` and `ColB` in the CSV file? — Georgy, May 02 '19 at 08:56
Hi Georgy, `ColA`, `ColB` are corresponding to `date`, `time`, `desc`, `entries`,`exits`; while `ID`, `Date` are corresponding to `c/a`, `unit`, `scp` And the code could look like: `pd.wide_to_long(test, stubnames=['date', 'time', 'desc', 'entries','exits'], i=['c/a', 'unit','scp'], j='value').reset_index([0,1]).reset_index(drop=True,inplace=True)` — Annie, May 02 '19 at 13:56
You said that you tried the solution with `reset_index` and it didn't work. But for me it works. Check again: `pd.wide_to_long(df.reset_index(), stubnames=['date', 'time', 'desc', 'entries','exits'], i='index', j='value').reset_index(drop=True).sort_values(['date', 'time', 'desc', 'entries','exits']).dropna(how='any')`. — Georgy, May 02 '19 at 14:58
Hi Georgy, I just tried again with your code and it works for me now. Thank you again for your time. This means a lot to me!!! Really appreciate it!!! — Annie, May 02 '19 at 15:32

score 0 · Accepted Answer · answered May 02 '19 at 07:42

0

I am not getting any error for both the lines, to achieve your desired result just use double reset_index as:

df = (pd.wide_to_long(df, stubnames=['ColA','ColB'], i=['ID','Date'], j='value')
        .reset_index([0,1])
        .reset_index(drop=True))

print(df)
    ID   Date  ColA  ColB
0  id1  date1     1     2
1  id1  date1     3     4
2  id1  date1     5     6
3  id2  date2     7     8
4  id2  date2     9    10
5  id2  date2    11    12

answered May 02 '19 at 07:42

Space Impact

13,085
23
48

Thank you for your response. The dataset that I had is more complex than the one in the question, with 4 stubnames columns and 3 id_variables. I still got the same error trying yours. I think the problem came from my specific dataset. But I don't know what's wrong with it in order to solve this issue. Do you have any suggestion? I really appreciate it. Thank you! – Annie May 02 '19 at 07:54
@Annie We can help, if you can provide a [`MCVE`](https://stackoverflow.com/help/mcve) that looks like your original data. – Space Impact May 02 '19 at 07:57
Hi Sandeep. I really appreciate your help. Please find the MCVE [here](https://drive.google.com/file/d/1FO8Lh-IOIW594kJsxOPgqdi4ZqA_x6OO/view?usp=sharing). Feel free to let me know if you need additional information. I look forward to hearing more from you soon. Thank you! – Annie May 02 '19 at 08:09
Hi Sandeep, do you have any updates on this problem that you could share? Anything would be highly appreciated. Thank you! – Annie May 02 '19 at 14:02
Hi, I just want to update that I already solved the problem above. Thank you for your help!!! – Annie May 02 '19 at 15:33

How to fix the "the id variables need to uniquely identify each row" problem when using wide_to_long?

1 Answers1