-1

I have a dataset with 600 rows. the data has one main ID= Version and second ID= Task. Data looks like this:

 Version  Task  Concept  Att 1 -  Att 2 -
       1     1        1        3        2
       1     1        2        1        1
       1     2        1        2        3
       1     2        2        1        2
       1     3        1        2        3
       1     3        2        3        1
       2     1        1        2        1
       2     1        2        3        2
       2     2        1        2        2
       2     2        2        1        3
       2     3        1        3        1
       2     3        2        1        3

I would like to change the format, so to have "Task" which belongs to the same "Version" in the same row like this:

 Version  Task  Concept  Att 1 -  Att 2 -  Version  Task  Concept  Att 1 -  Att 2 -
       1     1        1        3        2        1     1        2        1        1
       1     2        1        2        3        1     2        2        1        2
       1     3        1        2        3        1     3        2        3        1
       2     1        1        2        1        2     1        2        3        2
       2     2        1        2        2        2     2        2        1        3
       2     3        1        3        1        2     3        2        1        3

I have tried different things like groupby, pivot but I cannot find the right solution

mozway
  • 194,879
  • 13
  • 39
  • 75
George
  • 3
  • 3
  • What if there's more than two duplicate `Version` and `Task` rows? – Nick May 17 '23 at 11:34
  • Please [edit] your post to add code and data as text ([using code formatting](https://stackoverflow.com/editing-help#code)), not images. Images: A) don't allow us to copy-&-paste the code/errors/data for testing; B) don't permit searching based on the code/error/data contents; and [many more reasons](https://meta.stackoverflow.com/a/285557). Images should only be used, in addition to text in code format, if having the image adds something significant that is not conveyed by just the text code/error/data. See [mcve] on what code is required. – Adriaan May 17 '23 at 11:36
  • The general logic is `df.pivot(index=['Version', 'Task'], columns='Concept')`, then it looks like you're duplicating the pivot's index – mozway May 17 '23 at 11:37

1 Answers1

0

I think a pivot is the clean way to reshape (df.pivot(index=['Version', 'Task'], columns='Concept'), optionally with flattening the columns MultiIndex).

That said if you really want to duplicate the columns, you could combine a groupby and concat:

out = (pd.concat([g.set_index(['Version', 'Task'], drop=False)
                 for k, g in df.groupby('Concept')], axis=1)
         .reset_index(drop=True)
      )

Output:

   Version  Task  Concept  Att 1 -  Att 2 -  Version  Task  Concept  Att 1 -  Att 2 -
0        1     1        1        3        2        1     1        2        1        1
1        1     2        1        2        3        1     2        2        1        2
2        1     3        1        2        3        1     3        2        3        1
3        2     1        1        2        1        2     1        2        3        2
4        2     2        1        2        2        2     2        2        1        3
5        2     3        1        3        1        2     3        2        1        3
mozway
  • 194,879
  • 13
  • 39
  • 75