How to copy first few lines of a large file in hadoop to a new file?

Question

I have one big file in hdfs bigfile.txt. I want to copy the first 100 lines of it into a new file on hdfs. I tried the following command:

hadoop fs -cat /user/billk/bigfile.txt |head -100 /home/billk/sample.txt

It gave me a "cat: unable to write output stream" error. I am on hadoop 1.

Are there other ways to do this? (note: copying 1st 100 line to local or another file on hdfs is OK)

score 19 · Accepted Answer · edited May 23 '17 at 12:10

19

Like this -

hadoop fs -cat /user/billk/bigfile.txt | head -100 | hadoop -put - /home/billk/sample.txt

I believe the "cat: unable to write output stream" is just because head closed the stream after it read its limit. see this answer about head for hdfs - https://stackoverflow.com/a/19779388/3438870

edited May 23 '17 at 12:10

Community

1
1

answered Apr 04 '14 at 02:00

Scott

1,648
13
21

to copy the sample to local, use this: hadoop fs -cat /path/to/hdfsfile | head -100 | hadoop fs -get path/to/local/sample1 – Adrian Aug 04 '15 at 18:23
If you want the results local you can just redirect it to a file rather than piping through hdfs hadoop fs -cat /user/billk/bigfile.txt | head -100 > local/sample.txt – Scott Aug 14 '15 at 17:31
@Scott this also results in the `cat: unable to write to output stream` problem – conner.xyz Mar 23 '16 at 22:03
@conner.xyz The question was how to write the first 100 lines of a file in HDFS to a new file in HDFS. You are correct I believe it still throws the `cat: ...` error because head stops the output stream before the file stream is finished but it will write the 100 lines to the new HDFS file still. – Scott May 07 '16 at 17:29
You can always add `2>/dev/null` to skip error message `hdfs dfs -cat /user/billk/bigfile.txt 2>/dev/null | ...other code after pipe...` – Roman Kazakov May 12 '20 at 06:49

How to copy first few lines of a large file in hadoop to a new file?

1 Answers1