-2

I have the following dataframe df:

Video               1   1   1   1   1   1   1   1   1   1   ... 36  36  36  36  36  36  36  36  36  36
Confidence Value    3   3   4   4   4   5   5   3   5   3   ... 3   3   3   2   4   2   3   3   3   3

Where the row Video is the row with the names of the columns in the dataframe (i.e. the row with bold font that states the names of each column).

What I want is to rearrange this dataframe so that the output is this:

Video 1 2 3 ... 36
0     3 5 4 ... 3
1     1 2 3 ... 2
2     2 4 4 ... 5
3     4 5 4 ... 3
...

I have tried searching different ways to append, concatenate, merge etc. the columns in the way that I want but I can't figure out how since there are multiple instances of each Video, i.e. multiple 1, 2, .. 36.

So, for each of these multiple instances, I want to make one column of these with the Video number as the column name, and the rows be all the confidence values, as shown above.

Is that possible?

Oam
  • 305
  • 5
  • 13

1 Answers1

1

A transpose-pivot construct may be what suits your need.

Data

df = pd.read_csv(io.StringIO("""
Video               1   1   1   1   1   2   2   2   2   2   35  35  35  35  35  36  36  36  36  36
Confidence Value    3   3   4   4   4   5   5   3   5   3   3   3   3   2   4   2   3   3   3   3
"""), sep=r"\s{2,}", engine="python", header=None, index_col=0)

print(df)
                  1   2   3   4   5   6   7   ...  14  15  16  17  18  19  20
0                                             ...                            
Video              1   1   1   1   1   2   2  ...  35  35  36  36  36  36  36
Confidence Value   3   3   4   4   4   5   5  ...   2   4   2   3   3   3   3
[2 rows x 20 columns]

Code

This should work for indefinite number of confidence values per video:

idx = df.transpose().groupby("Video").cumcount().values
ans = df.transpose().set_index(idx).pivot(columns="Video", values="Confidence Value")

Note: If the number of confidence values per video are the same (5 in the example), then the groupby-cumcount step can be further simplified:

ans = df.transpose().set_index(np.tile(range(5), 4)).pivot(columns="Video", values="Confidence Value")

Result

print(ans)

Video  1   2   35  36
0       3   5   3   2
1       3   5   3   3
2       4   3   3   3
3       4   5   2   3
4       4   3   4   3
Bill Huang
  • 4,491
  • 2
  • 13
  • 31