0

In contrast to pandas, polars doesn't natively support reading zstd compressed csv files.

How can I get polars to read a csv compressed file, for example using xopen?

I've tried this:

from xopen import xopen
import polars as pl

with xopen("data.csv.zst", "r") as f:
    d = pl.read_csv(f)

but this errors with:

pyo3_runtime.PanicException: Expecting to be able to downcast into bytes from read result.: 
   PyDowncastError
Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
  • I don't know if polar supports this but in Python you can specifiy an entry in a zip file with a path-like object like this: `zipfile.Path('data.csv.zst', 'data.csv')` – buhtz Jun 06 '23 at 19:01

2 Answers2

1

One needs to xopen the file in binary mode "rb", then it works:

from xopen import xopen
import polars as pl

with xopen("data.csv.zst", "rb") as f:
    d = pl.read_csv(f)

Beware that the entire file will be read into memory before parsing, even if you immediately use only a subset of columns/rows.

Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
0

polars doesn't natively support reading compressed csv files.

This is not really true. We support decompression for zlib and gzip. You can make a feature request for zstd, then we can look into supporting that as well.

ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • Thanks! I don't think that's documented here (yet): https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.read_csv.html I've opened a feature request: https://github.com/pola-rs/polars/issues/9283 – Cornelius Roemer Jun 07 '23 at 11:41