0

I am reading a parquet file and transforming it into dataframe.

from fastparquet import ParquetFile 
pf = ParquetFile('file.parquet') 
df = pf.to_pandas() 

Is there a way to read a parquet file from a variable (that previously read and now hold parquet data)?

Thanks.

Joe
  • 11,983
  • 31
  • 109
  • 183

2 Answers2

0

In Pandas there is method to deal with parquet. Here is reference to the docs. Something like that:

import pandas as pd 
pd.read_parquet('file.parquet') 

should work. Also please read this post for engine selection.

Michał Zaborowski
  • 3,911
  • 2
  • 19
  • 39
  • Yes. Can you, please elaborate more about what you are trying to do? – Michał Zaborowski Mar 08 '19 at 09:45
  • Some process A reads a parquet file and have it in a Variable. Process B reads the Variable (parquet file variable). Just need to read parquet from Variable (not file). – Joe Mar 08 '19 at 13:25
0

You can read a file from a variable also using pandas.read_parquet using the following code. I tested this with the pyarrow backend but this should also work for the fastparquet backend.

import pandas as pd
import io

with open("file.parquet", "rb") as f:
    data = f.read()

buf = io.BytesIO(data)
df = pd.read_parquet(buf)
Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42