I have searched extensively on the Internet for any existing Scala Interface for operating on Hadoop ARchiving. I was not able to find any. Is there any API available?
Asked
Active
Viewed 114 times
1
-
Not clear what you mean. Hadoop APIs are all in Java. Any Java class works in Scala. – OneCricketeer Feb 13 '20 at 17:59
-
If you mean Spark/Scala specifically, then its available datasources do not include HAR format https://spark.apache.org/docs/latest/sql-data-sources.html. – mazaneicha Feb 13 '20 at 18:47
-
1@mazaneicha spark read data perfectly fine from har files. given that you have provide the `har:///` name space. Also, Hadoop's library in java, but doesn't provide any documentation for archiving. I looked for it. Now trying to figure out on my own. Will share findings. – lifeisshubh Feb 13 '20 at 19:33
-
@lifeisshubh Cool thanks, nice to know! – mazaneicha Feb 13 '20 at 19:34
-
If Spark can read it, then there's an InputFormat for it. That would mean there's a Java class for reading the files. Documentation exists as source code too. – OneCricketeer Feb 14 '20 at 04:05
-
Yes, those exist. For now, I have figure out that the code exists in `libraryDependencies += "org.apache.hadoop" % "hadoop-archives" % "2.10.0"` trying to make use of it. – lifeisshubh Feb 14 '20 at 09:39