There are different append
strategies depending on your needs.
df1 = pl.DataFrame({"a": [1], "b": [2], "c": [3]})
df2 = pl.DataFrame({"a": [4], "b": [5], "c": [6]})
# new memory slab
new_df = pl.concat([df1, df2], rechunk=True)
# append free (no memory copy)
new_df = df1.vstack(df2)
# try to append in place
df1.extend(df2)
To understand the differences, it is important to understand polars memory is immutable iff
it has any copy.
Copies in polars are free, because it only increments a reference count of the backing memory buffer instead of copying the data itself.
However, if a memory buffer has no copies yet, e.g. the refcount == 1
, we can mutate polars memory.
Knowing this background there are the following ways to append data:
concat
-> concatenate all given DataFrames
. This is sort of a linked list of DataFrames
. If you pass rechunk=True
, all memory will be reallocated to contiguous chunks.
vstack
-> Adds the data from other
to DataFrame
by incrementing a refcount. This is super cheap. It is recommended to call rechunk after many vstacks
. Or simply use pl.concat
.
extend
This operation copies data. It tries to copy data from other to DataFrame
. If however the refcount
of DataFrame
is larger than 1
. A new buffer of memory is allocated to hold both DataFrames
.