Is there a way to read a .csv file that is compressed via gz into a dask dataframe?
I've tried it directly with
import dask.dataframe as dd
df = dd.read_csv("Data.gz" )
but get an unicode error (probably because it is interpreting the compressed bytes) There is a "compression"
parameter but compression = "gz"
won't work and I can't find any documentation so far.
With pandas I can read the file directly without a problem other than the result blowing up my memory ;-) but if I restrict the number of lines it works fine.
import pandas.Dataframe as pd
df = pd.read_csv("Data.gz", ncols=100)