I have feather format file sales.feather
that I am using for exchanging data between python
and R.
In R I use the following command:
df = arrow::read_feather("sales.feather", as_data_frame=TRUE)
In python I used that:
df = pandas.read_feather("sales.feather")
What is the best way to load data from that file to memory into Spark instance operated from pyspark
?
I would like to also control pyspark.StorageLevel
for data read from feather.
I don't want to use pandas to load data because it segfaults for my 19GB feather file, created from 45GB csv.