Questions tagged [hadoop-archive]

Hadoop archives are special format archives. A Hadoop archive maps to a file system directory. A Hadoop archive always has a *.har extension. A Hadoop archive directory contains metadata (in the form of _index and _masterindex) and data (part-*) files. The _index file contains the name of the files that are part of the archive and the location within the part files.

7 questions

votes

1 answer

Compress output of Hadoop Archive tool

I'm using Hadoop Archive for reduce number of files in my Hadoop cluster, but for data retention, I want to keep my data as long as possible. Then the problem is Hadoop Archive not reduce folder size (my folder have multi-type of file, both small…

asked Jul 04 '16 at 09:55

dltu

vote

0 answers

Hadoop Archive Interface for Scala

I have searched extensively on the Internet for any existing Scala Interface for operating on Hadoop ARchiving. I was not able to find any. Is there any API available?

hadoop hdfs hadoop-archive

asked Feb 13 '20 at 10:17

lifeisshubh

vote

0 answers

Java code for creating Hadoop Archive

I am developing an application for creating 1 Hadoop Archive files from millions of small files. I have successfully tested that with command line(hadoop archive --archiveName foo.har -p / -r 2 test.txt /) but i am not able to find out any resources…

java hadoop hadoop-archive

asked Jul 29 '17 at 04:55

Krishnom

1,348
12
39

vote

1 answer

Querying data from har archives - Apache Hive

I am using Hadoop and facing the dreaded problem of large numbers of small files. I need to be able to create har archives out of existing hive partitions and query them at the same time. However, Hive apparently supports archiving partitions only…

hadoop hive partitioning hadoop-archive

asked Jun 03 '16 at 10:23

Ankit Khettry

votes

1 answer

Hadoop Archive Command

How to use Hadoop archive technique and the command needed?

hadoop mapreduce hdfs hadoop-archive

asked Jul 23 '20 at 05:59

Arif Sumanggara Nainggolan

votes

3 answers

Hive archive partition(dynamic) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I'm trying to archive some old data from my table. Using ALTER TABLE TABLE_NAME ARCHIVE PARTITION(part_col) query. Hadoop version - 2.7.3 Hive version - 1.2.1 Table structure is as follows, hive> desc clicks_fact; OK time …

hadoop hive hiveql hadoop2 hadoop-archive

asked Oct 12 '17 at 16:37

Sridhar

1,518
14
27

votes

1 answer

Archiving incoming small hdfs files

I have small files coming into hdfs everyday. I am planning to use hadoop archive (HAR) but how can I archive these small files that comes into hdfs everyday. Eg: I might get 5 files today I need to archive them and tomorrow if I get 5 more files I…

hadoop hdfs archive hadoop-archive bigdata

asked Jan 14 '16 at 00:41

Naveen