1

I have a melted, long-format dataframe like this:

name = ["A", "A", "B", "B"]
varA = [1, 2, 1, 2]
varB = [200, 250, 200, 250]
val = [4, 8, 1, 0]

df = pd.DataFrame(
    data=zip(name, varA, varB, val), columns=["name", "varA", "varB", "val"]
)

enter image description here

How can I reshape it like this?

enter image description here

There is a similar question here, so I tried the following, which threw the error ValueError: Index contains duplicate entries, cannot reshape

df2 = (df.set_index(['varA','varB'])
        .stack()
        .unstack(0)
        .reset_index()
        .rename_axis(None, axis=1))

I'm sure this is easy for someone familiar with all the Pandas functions/methods, but there are a ton of functions for the casual user to keep track of!

a11
  • 3,122
  • 4
  • 27
  • 66
  • The accepted solution at [Python : Pandas pivot table for multiple columns at once which has duplicate values](https://stackoverflow.com/questions/70074621/python-pandas-pivot-table-for-multiple-columns-at-once-which-has-duplicate-val) worked – a11 Dec 03 '21 at 22:58

1 Answers1

4
  1. Make a helper column that assigns which rows are associated with name-1/val-1, and name-2/val-2
  2. Using this new column we can now pivot the data without encountering that "duplicate entries" error

Clean the output:

  1. sort the columns to match your output
  2. The pivot will return a dataframe with a columnar multiindex. We'll need to flatten this by combining levels

pandas > 1.0.5

id_vars = ["varA", "varB"]

# Create an id column that tracks the id_vars
# Pivot using the id column
# Sort the columns to match OP expected output
pivoted_df = (
    df.assign(
        id=df.groupby(id_vars).cumcount().add(1).astype(str)
    )
    .pivot(index=id_vars, columns="id", values=["name", "val"])
    .sort_index(level=1, axis=1)
)

# flatten the column multiindex, insert the row index as values
flattened_columns = pivoted_df.columns.map("-".join)
pivoted_df = (
    pivoted_df.set_axis(flattened_columns, axis=1)
    .reset_index()
)

print(pivoted_df)
   varA  varB name-1 val-1 name-2 val-2
0     1   200      A     4      B     1
1     2   250      A     8      B     0

pandas <= 1.0.5

  • pivot in these versions did not support using lists as arguments. A workaround is to instead use the pivot_table with a nonaggregating aggfunc prevent accidental aggregation.
id_vars = ["varA", "varB"]

# Create an id column that tracks the id_vars
# Pivot using the id column
# Sort the columns to match OP expected output
pivoted_df = (
    df.assign(
        id=df.groupby(id_vars).cumcount().add(1).astype(str)
    )
    .pivot_table(index=id_vars, columns="id", values=["name", "val"], aggfunc=lambda x: x)
    .sort_index(level=1, axis=1)
)

# flatten the column multiindex, insert the row index as values
flattened_columns = pivoted_df.columns.map("-".join)
pivoted_df = (
    pivoted_df.set_axis(flattened_columns, axis=1)
    .reset_index()
)
Cameron Riddell
  • 10,942
  • 9
  • 19
  • thank you, this looks close, but I get `ValueError: Shape of passed values is (4, 2), indices imply (2, 2)` – a11 Dec 03 '21 at 22:55
  • What version of pandas are you using? I'm using "1.1.5", also if you can paste in more of the traceback, I can assist further. copy/pasting this code works for me with the input you provided. – Cameron Riddell Dec 03 '21 at 23:00
  • I am using 1.0.5. The traceback is too long to paste, but error is on `df.assign` part and says `1693 raise ValueError("Empty data passed with indices specified.") -> 1694 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}") 1695 ValueError: Shape of passed values is (4, 2), indices imply (2, 2) ` – a11 Dec 03 '21 at 23:05
  • 1
    Ah, the error is because the `pivot` function in `pandas <= 1.0.5` did not accept lists as arguments. This feature gained support in [pandas = 1.1.0](https://github.com/pandas-dev/pandas/issues/21425). Either see the workaround I included in my answer, or update your pandas to something more recent. – Cameron Riddell Dec 03 '21 at 23:32
  • that makes so much sense now, I tried a few other efforts with pivot before posting the OP... thank you – a11 Dec 03 '21 at 23:44