Concatenating dataframes can be obtained with concat
, vstack
or extend
.
vstack is a very inexpensive way to contatenate.
Below an example using vstack
:
import polars as pl
df = pl.DataFrame({
'id' : [123, 123, 456, 456],
'age_pre' : [55, 56, 49, 50],
'age_post' : [56, 57, 50, 51],
'exposure_pre' : [0.49, 0.49, 0.80, 0.80],
'exposure_post' : [0.51, 0.51, 0.20, 0.20]
})
column_rename_pre = {column.name : column.name.replace('_pre','')
for column in df if 'pre' in column.name}
column_rename_post = {column.name : column.name.replace('_post','')
for column in df if 'post' in column.name}
out = (
df.select(pl.col('^(id|.*_pre)$')).rename(column_rename_pre)
.vstack(df.select(pl.col('^(id|.*_post)$')).rename(column_rename_post))
)
# result
shape: (8, 3)
┌─────┬─────┬──────────┐
│ id ┆ age ┆ exposure │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪══════════╡
│ 123 ┆ 55 ┆ 0.49 │
│ 123 ┆ 56 ┆ 0.49 │
│ 456 ┆ 49 ┆ 0.8 │
│ 456 ┆ 50 ┆ 0.8 │
│ 123 ┆ 56 ┆ 0.51 │
│ 123 ┆ 57 ┆ 0.51 │
│ 456 ┆ 50 ┆ 0.2 │
│ 456 ┆ 51 ┆ 0.2 │
└─────┴─────┴──────────┘
There is a good Stack Overflow answer explaining the difference between concat, vstack and extend: How can I append or concatenate two dataframes in python polars?
I copy below the recommendations from that answer
There are the following ways to append data:
concat -> concatenate all given DataFrames. This is sort of a linked list of DataFrames. If you pass rechunk=True, all memory will be reallocated to contiguous chunks.
vstack -> Adds the data from other to DataFrame by incrementing a refcount. This is super cheap. It is recommended to call rechunk after many vstacks. Or simply use pl.concat.
extend This operation copies data. It tries to copy data from other to DataFrame. If however the refcount of DataFrame is larger than 1. A new buffer of memory is allocated to hold both DataFrames.