Questions tagged [hadoop-archive]

Hadoop archives are special format archives. A Hadoop archive maps to a file system directory. A Hadoop archive always has a *.har extension. A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.

7 questions
2
votes
1 answer

Compress output of Hadoop Archive tool

I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small…
dltu
  • 34
  • 8
1
vote
0 answers

Hadoop Archive Interface for Scala

I have searched extensively on the Internet for any existing Scala Interface for operating on Hadoop ARchiving. I was not able to find any. Is there any API available?
lifeisshubh
  • 513
  • 1
  • 5
  • 27
1
vote
0 answers

Java code for creating Hadoop Archive

I am developing an application for creating 1 Hadoop Archive files from millions of small files. I have successfully tested that with command line(hadoop archive --archiveName foo.har -p / -r 2 test.txt /) but i am not able to find out any resources…
Krishnom
  • 1,348
  • 12
  • 39
1
vote
1 answer

Querying data from har archives - Apache Hive

I am using Hadoop and facing the dreaded problem of large numbers of small files. I need to be able to create har archives out of existing hive partitions and query them at the same time. However, Hive apparently supports archiving partitions only…
Ankit Khettry
  • 997
  • 1
  • 13
  • 33
0
votes
1 answer

Hadoop Archive Command

How to use Hadoop archive technique and the command needed?
0
votes
3 answers

Hive archive partition(dynamic) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I'm trying to archive some old data from my table. Using ALTER TABLE TABLE_NAME ARCHIVE PARTITION(part_col) query. Hadoop version - 2.7.3 Hive version - 1.2.1 Table structure is as follows, hive> desc clicks_fact; OK time …
Sridhar
  • 1,518
  • 14
  • 27
0
votes
1 answer

Archiving incoming small hdfs files

I have small files coming into hdfs everyday. I am planning to use hadoop archive (HAR) but how can I archive these small files that comes into hdfs everyday. Eg: I might get 5 files today I need to archive them and tomorrow if I get 5 more files I…
Naveen
  • 123
  • 3
  • 15