I am trying to download a dataset of files in the HDF5 format. All the files are located on a HDFS that I set up. I want to use Spark to download the files and then somehow convert them. I havent figured out how to convert the HDF5 files into something usable/readable. Is it possible to convert them into a dataframe and then work on it with pandas?
Any help is appreciated. Thanks in advance
I have tried to read some documentation about wrapper classes etc. but I am pretty new to programming and a bit lost. I worked with csv files before and that worked flawlessly to download them from the HDFS using spark and then running panda commands on the dataframe, but I am struggling with the HDF5 format.