-1

I have a column called VERSION_INDEX which is Int64 and is a proxy for keeping a list of semantic software versions ordered such that 0.2.0 comes after 0.13.0. When I pivot, the column names created from the pivot are sorted alphanumerically.

pivot_df = merged_df.pivot(index=test_events_key_columns, columns='VERSION_INDEX', values='Status')
print(pivot_df)

enter image description here

Is it possible to keep the column order numeric during the pivot such that 9 comes before 87?

thx

rchitect-of-info
  • 1,150
  • 1
  • 11
  • 23
  • Does this answer your question? [Sorting columns in pandas dataframe based on column name](https://stackoverflow.com/questions/11067027/sorting-columns-in-pandas-dataframe-based-on-column-name) – ASHMIL Mar 07 '22 at 17:42
  • 1
    @ASHMIL No that is a `pandas` solution, not a `polars` solution – rchitect-of-info Mar 07 '22 at 18:18
  • I know how to post-process the `polars` column order, just wondering if a sort order can be introduced during the `pivot` – rchitect-of-info Mar 07 '22 at 18:19

1 Answers1

1

In Polars, column names are always stored as strings, and hence you have the alphanumeric sorting rather than numeric. There is no way around the strings, so I think the best you can do is to compute the column order you want, and select the columns:

import polars as pl

df = pl.DataFrame({"version": [9, 85, 87], "testsuite": ["scan1", "scan2", "scan3"], "status": ["ok"] * 3})
wide = df.pivot(index="testsuite", columns='version', values='status')
cols = df["version"].cast(pl.Utf8).to_list()
wide[["testsuite"] + cols]
┌───────────┬──────┬──────┬──────┐
│ testsuite ┆ 9    ┆ 85   ┆ 87   │
│ ---       ┆ ---  ┆ ---  ┆ ---  │
│ str       ┆ str  ┆ str  ┆ str  │
╞═══════════╪══════╪══════╪══════╡
│ scan1     ┆ ok   ┆ null ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ scan2     ┆ null ┆ ok   ┆ null │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ scan3     ┆ null ┆ null ┆ ok   │
└───────────┴──────┴──────┴──────┘
jvz
  • 1,183
  • 6
  • 13