0

I have a Pandas DataFrame of the following format:

model    Interpretability    Diversity    AUC    num_topics
LDA      0.123               0.41         0.61    5
LDA      0.234               0.42         0.62    10
LDA      0.345               0.43         0.63    15
LDA      0.456               0.44         0.64    20
LDA      0.567               0.45         0.65    25
LDA      0.678               0.46         0.66    30
LDA      0.789               0.47         0.67    35
LDA      0.890               0.48         0.68    40
ETM      0.124               0.51         0.71    5
ETM      0.235               0.52         0.72    10
ETM      0.346               0.53         0.73    15
ETM      0.457               0.54         0.74    20
ETM      0.568               0.55         0.75    25
ETM      0.679               0.56         0.76    30
ETM      0.780               0.57         0.77    35
ETM      0.891               0.58         0.78    40
CTM      0.125               0.61         0.81    5
CTM      0.236               0.62         0.82    10
CTM      0.347               0.63         0.83    15
CTM      0.458               0.64         0.84    20
CTM      0.569               0.65         0.85    25
CTM      0.670               0.66         0.86    30
CTM      0.781               0.67         0.87    35
CTM      0.892               0.68         0.88    40

Based on this data, I want to create linecharts with the number of topics on the x-axis, each model being one line and the chart should either show Interpretability, Diversity or AUC.

Based on earlier linecharts that I have drawn, I think I should reshape my dataframe to the following format (in this case focusing on 'Interpretatbility', but I want to do that for the other columns as well):

LDA       ETM     CTM     num_topics
0.123     0.124   0.125   5
0.234     0.235   0.236   10
0.345     0.346   0.347   15
0.456     0.457   0.458   20
0.567     0.568   0.569   25
0.678     0.679   0.670   30
0.789     0.780   0.781   35
0.890     0.891   0.892   40

This link shows how to make a pivot table. However, as far as I understand, pivot tables aggregate or summarize data. I dont want to perform any such operation; I just want to reshape the dataframe.

How can I reshape the DataFrame?

Emil
  • 1,531
  • 3
  • 22
  • 47
  • 1
    The linked question has `pivot`, which doesn't aggregate: `df.pivot(index='num_topics', columns='model', values='Interpretability')`. Also, if your data is unique (one row per `num_topic,model` pair), you wouldn't need to worry about aggregating, e.g. `pivot_table` default aggfunc `mean` works on 1 element for each group would do nothing. – Quang Hoang Dec 14 '21 at 14:16
  • Ah, I see. That is what I was looking for indeed; thanks! – Emil Dec 14 '21 at 14:22

0 Answers0