0

Do you know any way for computing number of rows in parquet in Scala? Any hadoop library? Or parquet library? I would like to avoid spark. I mean something like:
number_rows("hdfs:///tmp/parquet")

  • 2
    Does this answer your question? [Read local Parquet file without Hadoop Path API](https://stackoverflow.com/questions/59939309/read-local-parquet-file-without-hadoop-path-api) – mazaneicha Jun 22 '22 at 13:14
  • You don't want to use Spark, but that's in your user name? You can use Hive or Pig to run a COUNT statement – OneCricketeer Jun 22 '22 at 13:56
  • It's only a few lines of Python if you don't mind using that instead. This is assuming that the file fits into memory. https://stackoverflow.com/questions/33813815/how-to-read-a-parquet-file-into-pandas-dataframe – Ben Watson Jun 24 '22 at 08:31

0 Answers0