1

I have searched extensively on the Internet for any existing Scala Interface for operating on Hadoop ARchiving. I was not able to find any. Is there any API available?

mazaneicha
  • 8,794
  • 4
  • 33
  • 52
lifeisshubh
  • 513
  • 1
  • 5
  • 27
  • Not clear what you mean. Hadoop APIs are all in Java. Any Java class works in Scala. – OneCricketeer Feb 13 '20 at 17:59
  • If you mean Spark/Scala specifically, then its available datasources do not include HAR format https://spark.apache.org/docs/latest/sql-data-sources.html. – mazaneicha Feb 13 '20 at 18:47
  • 1
    @mazaneicha spark read data perfectly fine from har files. given that you have provide the `har:///` name space. Also, Hadoop's library in java, but doesn't provide any documentation for archiving. I looked for it. Now trying to figure out on my own. Will share findings. – lifeisshubh Feb 13 '20 at 19:33
  • @lifeisshubh Cool thanks, nice to know! – mazaneicha Feb 13 '20 at 19:34
  • If Spark can read it, then there's an InputFormat for it. That would mean there's a Java class for reading the files. Documentation exists as source code too. – OneCricketeer Feb 14 '20 at 04:05
  • Yes, those exist. For now, I have figure out that the code exists in `libraryDependencies += "org.apache.hadoop" % "hadoop-archives" % "2.10.0"` trying to make use of it. – lifeisshubh Feb 14 '20 at 09:39

0 Answers0