Is it possible with AwkwardArray (awkward0
) to append to an existing parquet file (written by AwkwardArray)?
Normal Awkward Parquet storing
The following code creates a Parquet file with inside a few Awkward arrays (e.g. audio data):
import numpy as np
import awkward as awk
import pyarrow.parquet as pq
# create Awkward Table from dict with numpy arrays
awk_array = awk.fromiter([{"ch0": np.array([0, 1, 2]), "ch1": np.array([3, 4, 5])},
{"ch0": np.array([6, 7]), "ch1": np.array([8, 9])}])
awk_array.tolist()
# [{'ch0': [0, 1, 2], 'ch1': [3, 4, 5]}, {'ch0': [6, 7], 'ch1': [8, 9]}]
# save in Parquet format
awk.toparquet("audio.parquet", awk_array)
# check if we can successfully load again; success
awk.fromparquet("audio.parquet")["ch0"].tolist()
# [[0, 1, 2], [6, 7]]
Appending Parquet (no Awkward)
In the pyarrow documentation about Parquet files, you can extend a Parquet file with:
with pq.ParquetWriter('example3.parquet', table.schema) as writer:
for i in range(3):
writer.write_table(table)
Question
Is something like this possible with Awkward arrays?:
akw_arrays = []
akw_arrays.append(awk.fromiter([{"ch0": np.array([0, 1, 2]), "ch1": np.array([3, 4, 5])}]))
akw_arrays.append(awk.fromiter([{"ch0": np.array([6, 7]), "ch1": np.array([8, 9])}]))
# Awkward table schema
with pq.ParquetWriter("audio_append.parquet", awk.table.schema) as writer:
for i in range(len(akw_arrays)):
writer.write_table(akw_arrays[i])
Something like with a awkward.table.schema
or an awkward.ParquetWriter()
?
In reality, I don't have both arrays at the same time. Therefore, concatenating before writing is not possible.
Or is the only possibility to make use of something like Apache Arrow, and write everything at once to disk at the end?