Hadoop archives are special format archives. A Hadoop archive maps to a file system directory. A Hadoop archive always has a *.har extension. A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.
Questions tagged [hadoop-archive]
7 questions
2
votes
1 answer
Compress output of Hadoop Archive tool
I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small…

dltu
- 34
- 8
1
vote
0 answers
Hadoop Archive Interface for Scala
I have searched extensively on the Internet for any existing Scala Interface for operating on Hadoop ARchiving. I was not able to find any. Is there any API available?

lifeisshubh
- 513
- 1
- 5
- 27
1
vote
0 answers
Java code for creating Hadoop Archive
I am developing an application for creating 1 Hadoop Archive files from millions of small files. I have successfully tested that with command line(hadoop archive --archiveName foo.har -p / -r 2 test.txt /) but i am not able to find out any resources…

Krishnom
- 1,348
- 12
- 39
1
vote
1 answer
Querying data from har archives - Apache Hive
I am using Hadoop and facing the dreaded problem of large numbers of small files. I need to be able to create har archives out of existing hive partitions and query them at the same time. However, Hive apparently supports archiving partitions only…

Ankit Khettry
- 997
- 1
- 13
- 33
0
votes
1 answer
Hadoop Archive Command
How to use Hadoop archive technique and the command needed?
0
votes
3 answers
Hive archive partition(dynamic) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I'm trying to archive some old data from my table. Using ALTER TABLE TABLE_NAME ARCHIVE PARTITION(part_col) query.
Hadoop version - 2.7.3
Hive version - 1.2.1
Table structure is as follows,
hive> desc clicks_fact;
OK
time …

Sridhar
- 1,518
- 14
- 27
0
votes
1 answer
Archiving incoming small hdfs files
I have small files coming into hdfs everyday. I am planning to use hadoop archive (HAR) but how can I archive these small files that comes into hdfs everyday. Eg: I might get 5 files today I need to archive them and tomorrow if I get 5 more files I…

Naveen
- 123
- 3
- 15