3

I need to read parquet files from multiple directories.

for example,

 Dir---
          |
           ----dir1---
                      |
                       .parquet
                       .parquet
          |
           ----dir2---
                      |
                       .parquet
                       .parquet
                       .parquet

Is there a way to read these file to single pandas data frame?

note: All of parquet files was generated using pyspark.

Ahmad Senousi
  • 613
  • 2
  • 12
  • 24

1 Answers1

7

Use read_parquet in list comprehension and concat with all files generated by glob with ** (python 3.5+):

import pandas as pd
import glob

files = glob.glob('Dir/**/*.parquet')
df = pd.concat([pd.read_parquet(fp) for fp in files])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252