Why is there no 'hadoop fs -head' shell command?

Question

A fast method for inspecting files on HDFS is to use tail:

~$ hadoop fs -tail /path/to/file

This displays the last kilobyte of data in the file, which is extremely helpful. However, the opposite command head does not appear to be part of the shell command collections. I find this very surprising.

My hypothesis is that since HDFS is built for very fast streaming reads on very large files, there is some access-oriented issue that affects head. This makes me hesitant to do things to access the head. Does anyone have an answer?

Lack of community interest to implement such feature? [https://issues.apache.org/jira/browse/HDFS-206](https://issues.apache.org/jira/browse/HDFS-206). — cabad, Nov 04 '13 at 23:50

score 152 · Accepted Answer · answered Nov 04 '13 at 23:37

I would say it's more to do with efficiency - a head can easily be replicated by piping the output of a hadoop fs -cat through the linux head command.

hadoop fs -cat /path/to/file | head

This is efficient as head will close out the underlying stream after the desired number of lines have been output

Using tail in this manner would be considerably less efficient - as you'd have to stream over the entire file (all HDFS blocks) to find the final x number of lines.

hadoop fs -cat /path/to/file | tail

The hadoop fs -tail command as you note works on the last kilobyte - hadoop can efficiently find the last block and skip to the position of the final kilobyte, then stream the output. Piping via tail can't easily do this.

How to put this head hadoop thing into alias form. argpas() { hdfs dfs -cat $1 | head -$2 } alias hh=argpas I have tried this but it is not working — Indrajeet Gour, Jun 07 '16 at 08:45
bash function to call the same (optionally via `-n {num} {hdfs_path}` giving num lines to show): `hdfs-head() { [ "$1" = "-n" ] && { n=$2; shift 2; } || n=10; hdfs dfs -cat "$@" | head -n $n ; } ` — michael, Jun 25 '17 at 15:03

yishaiz · Answer 2 · 2018-10-09T12:31:02.457

9

Starting with version 3.1.0 we now have it:

Usage: hadoop fs -head URI

Displays first kilobyte of the file to stdout.

See here.

edited Oct 09 '18 at 12:31

answered Jan 02 '18 at 15:43

yishaiz

2,433
4
28
49

score 3 · Answer 3 · edited Apr 21 '15 at 09:18

3

hdfs -dfs /path | head

is a good way to solve the problem.

edited Apr 21 '15 at 09:18

TZHX

5,291
15
47
56

answered Apr 21 '15 at 08:58

xu2mao

572
6
8

How would you save the result of this | head -n into a file in HDFS? – Loebre Oct 01 '19 at 19:53
@Loebre `hdfs dfs -cat /file |head -1 | hdfs dfs -put - /otherfile` where the dash before `/otherfile` refers to `STDIN`. – Itération 122442 Dec 23 '22 at 09:24

score 2 · Answer 4 · edited Aug 13 '17 at 08:10

2

you can try the folowing command

hadoop fs -cat /path | head -n

where -n can be replace with number of records to view

edited Aug 13 '17 at 08:10

George Edwards

8,979
20
78
161

answered Aug 13 '17 at 07:18

Amey

21
1

score 2 · Answer 5 · answered Dec 02 '17 at 11:16

2

In Hadoop v2:

hdfs dfs -cat /file/path|head

In Hadoop v1 and v3:

hadoop fs -cat /file/path|head

answered Dec 02 '17 at 11:16

Ani Menon

27,209
16
105
126

Why is there no 'hadoop fs -head' shell command?

5 Answers5

Linked