1

I want to know if there is a possibility to charge HFile in RDD or Dataframe in PySPark ? In order to charge each HFile as csv file for instance.

Thanks for your help !

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
Alan CUZON
  • 35
  • 4
  • In Java (and Scala by extension), you would read HFile content using `HFile.Reader` to create an `HFileScanner` (https://hbase.apache.org/2.3/devapidocs/org/apache/hadoop/hbase/io/hfile/HFile.Reader.html#getScanner-boolean-boolean-) and then iterate through it. Not sure if pyspark / python has anything similar. – mazaneicha Jun 20 '22 at 18:53

0 Answers0