2

given the dataframe as follows:

index    name    reference    value
0        name_1  ab1234       0.1 
1        name_1  cd1111       0.2
2        name_1  cd1112       0.01
3        name_2  ab1234       0.03       
4        name_2  ab4567       0.1
...
n        name_j  lm1234       0.x

How can one transform the data into a new dataframe such that the unique references are the index and the unique names are the columns.

We know that the unique references that make up the index are indices are reference.unique(). And we also know that the unique names that make up the columns headers are name.unique().

The expected result would be a sparse matrix of this form:

reference    name_1    name_2    name_3    ...    name j
ab1234       0.1       0.03
cd1111       0.2
cd1112       0.01
ab1234       
ab4567                 0.1

...                                        ...    
lm1234                                            0.x
 

Expect that there is a vectorized way or a pivot that would work. For example, something like this was attempted, but failed:

new_df = df.pivot(index='reference', columns=name.unique(), values='value')
D.L
  • 4,339
  • 5
  • 22
  • 45
  • Maybe `new_df = df.pivot(index='reference', columns='name', values='value')`? – Ynjxsjmh May 26 '22 at 15:05
  • unfortunately this does not work. – D.L May 26 '22 at 15:23
  • 1
    it is possible that you have more than one combination of index and column and you have to aggregate the values with aggfunc: new_df = df.pivot_table(index='reference', columns='name', aggfunc=list) – Ze'ev Ben-Tsvi May 26 '22 at 15:49

1 Answers1

2
df.pivot_table(index='reference', columns='name', values='value', aggfunc='sum')

Output:

name       name_1  name_2
reference
ab1234       0.10    0.03
ab4567        NaN    0.10
cd1111       0.20     NaN
cd1112       0.01     NaN
  • this was close and leads to the answer: `df.pivot_table(index='reference', columns='name', values='value')` ... for some reason `pivot_table` works and `pivot` does not. – D.L May 26 '22 at 16:12
  • pivot works for me as well: df.pivot(index='reference', columns='name', values='value'), what exactly does not work? – Sergey Sakharovskiy May 26 '22 at 16:19
  • `pivot` does not work for me but `pivot_table` does work. I am using python `version 3.6`. which version are you using ? with `pivot` i get this error message: `Index contains duplicate entries, cannot reshape`. – D.L May 26 '22 at 16:23
  • 1
    i can see that there is a post here that explains the difference: https://stackoverflow.com/questions/30960338/pandas-difference-between-pivot-and-pivot-table-why-is-only-pivot-table-workin – D.L May 26 '22 at 16:25
  • python : 3.8.6 pandas : 1.1.3 – Sergey Sakharovskiy May 26 '22 at 16:26
  • 1
    okay, so its not a versioning thing, it is `pivot` vs `pivot_table` which are subtly different (certainly the case for large examples. I have marked this answer as correct as it leads to the right answer. – D.L May 26 '22 at 16:30
  • Yes, I've realized it as well, you might have encountered duplicates in the 'reference' column, I used only small extract of you data without duplicates. – Sergey Sakharovskiy May 26 '22 at 16:32