11

I have one big file in hdfs bigfile.txt. I want to copy the first 100 lines of it into a new file on hdfs. I tried the following command:

hadoop fs -cat /user/billk/bigfile.txt |head -100 /home/billk/sample.txt

It gave me a "cat: unable to write output stream" error. I am on hadoop 1.

Are there other ways to do this? (note: copying 1st 100 line to local or another file on hdfs is OK)

Rolando
  • 58,640
  • 98
  • 266
  • 407

1 Answers1

19

Like this -

hadoop fs -cat /user/billk/bigfile.txt | head -100 | hadoop -put - /home/billk/sample.txt

I believe the "cat: unable to write output stream" is just because head closed the stream after it read its limit. see this answer about head for hdfs - https://stackoverflow.com/a/19779388/3438870

Community
  • 1
  • 1
Scott
  • 1,648
  • 13
  • 21
  • to copy the sample to local, use this: hadoop fs -cat /path/to/hdfsfile | head -100 | hadoop fs -get path/to/local/sample1 – Adrian Aug 04 '15 at 18:23
  • If you want the results local you can just redirect it to a file rather than piping through hdfs hadoop fs -cat /user/billk/bigfile.txt | head -100 > local/sample.txt – Scott Aug 14 '15 at 17:31
  • @Scott this also results in the `cat: unable to write to output stream` problem – conner.xyz Mar 23 '16 at 22:03
  • @conner.xyz The question was how to write the first 100 lines of a file in HDFS to a new file in HDFS. You are correct I believe it still throws the `cat: ...` error because head stops the output stream before the file stream is finished but it will write the 100 lines to the new HDFS file still. – Scott May 07 '16 at 17:29
  • You can always add `2>/dev/null` to skip error message `hdfs dfs -cat /user/billk/bigfile.txt 2>/dev/null | ...other code after pipe...` – Roman Kazakov May 12 '20 at 06:49