1

I want to download a large file in automatic script with 'wget', but the 'progress' generated by 'wget' is too verbose. For example:

wget --progress=dot:mega 'http://mysite/my_large_file'

For my file is over 1.5GByes, while the speed is very fast (>9MB/s), even setting the progress style to 'mega', the output is still too verbose:

     0K ........ ........ ........ ........ ........ ........  0% 2.03M 13m16s
  3072K ........ ........ ........ ........ ........ ........  0% 3.85M 10m7s
  6144K ........ ........ ........ ........ ........ ........  0% 3.85M 9m3s
  9216K ........ ........ ........ ........ ........ ........  0% 3.89M 8m30s

But I don't want to completed turn off the output of progress, for it lets me to know if there is any issue. Now I use 'sed' to remove the dots:

wget --progress=dot:mega 'http://mysite/my_large_file' 2>&1 | sed -r 's/^ *([0-9]+K)[. ]*([0-9]+%) +([0-9.]+[A-Z]) +(.*)$/<\1,\2,\3\/s,remain:\4>, /g'

The output looks much better:

<0K,0%,2.45M/s,remain:11m0s>,
<3072K,0%,9.13M/s,remain:6m58s>,
<6144K,0%,9.35M/s,remain:5m35s>,
<9216K,0%,9.37M/s,remain:4m54s>,
<12288K,0%,9.52M/s,remain:4m28s>,
<15360K,1%,9.42M/s,remain:4m11s>,

Now I want to even remove the 'new line' characters at the end of each line, so my automation framework won't discard anything. I tried 'td', 'awk', but all of them don't output instantly. That is, when I use 'sed', it outputs the lines while the download is ongoing, but when I use 'td' or 'awk', I waited for a long time but nothing is output. I guess it will output the whole document when the download is finished, which is useless.

So I wonder if there is a way to remove the 'new line' characters while output the stream instantly.

By the way, is there a way that make 'wget' progress output less verbose but not 'no verbose'. For example print every 10MB or 20MB per line, or, my favorite way, print the progress every, for example 10 seconds.

As suggested in the comments, here I put my desired output:

<0K,0%,2.45M/s,remain:11m0s>, <3072K,0%,9.13M/s,remain:6m58s>, <6144K,0%,9.35M/s,remain:5m35s>, <9216K,0%,9.37M/s,remain:4m54s>, <12288K,0%,9.52M/s,remain:4m28s>, <15360K,1%,9.42M/s,remain:4m11s>,

All the output in one line.

Vespene Gas
  • 3,210
  • 2
  • 16
  • 27
  • Please add your desired output for that sample input to your question. – Cyrus Oct 19 '18 at 03:15
  • @Cyrus I added it at the end of my question. – Vespene Gas Oct 19 '18 at 03:39
  • You can modify what wget shows while downloading with options **`-nv`** (or equivalent `--no-verbose`). **`--progress`**, **`--show-progress`**. But not directly what you want from options. – Nic3500 Oct 19 '18 at 03:41
  • For modifying output "on the go" you could try `stdbuf`. Look at this link: https://stackoverflow.com/questions/7161821/how-to-grep-a-continuous-stream, the response with 51 upvotes from XzKto. It *might* work for your wget. – Nic3500 Oct 19 '18 at 03:49
  • @Nic3500 As I said in my question, turning off verbose is not what I want. Without the verbose, if the download is blocked in the midway, the screen will show nothing, just like it is still downloading. I may have to wait until 1 or 2 hours and notice that the job is still ongoing, and then know that something is going wrong. With verbose, if there is no output for just 10 seconds, I will know that some issue happens. But the default verbose behavior prints too much log, even setting style to 'mega' is still prints too much. So I want less verbose, but not 'no verbose'. – Vespene Gas Oct 19 '18 at 03:51
  • And for sed, https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed shows how to remove \n. Since your `sed` already produce what you want, except it is not one 1 line, this might do it. – Nic3500 Oct 19 '18 at 03:52
  • @Nic3500 The suggestion of adding 'stdbuf' doesn't work, for it is grep, which decides whether print a line or not, rather than modifies the input line. Regarding the suggestion of sed, I tried the command but it doesn't output the result "on the go". I also tried the perl way, bash way suggested in the post, but none of them works. All of them seems to output the results only when the end if the stream is met. – Vespene Gas Oct 19 '18 at 04:06
  • Ok sorry then. That is why I did not put them as an answer. I will keep an eye out for an answer on this one! – Nic3500 Oct 19 '18 at 04:23
  • One thing you can attempt is using `wget --progress=bar:force:noscroll`. This will force Wget to use its interactive display progress bar but in the output file. Generally, this is not a great idea, but it seems like it might just be the perfect fit for you – darnir Oct 20 '18 at 09:51

2 Answers2

0

In addition to your sed, you just need to pipe a tr instruction, giving at end:

wget --progress=dot:mega 'http://mysite/my_large_file' 2>&1 | sed -r 's/^ *([0-9]+K)[. ]*([0-9]+%) +([0-9.]+[A-Z]) +(.*)$/<\1,\2,\3\/s,remain:\4>, /g' |tr -d '\n'
Bsquare ℬℬ
  • 4,423
  • 11
  • 24
  • 44
  • I guess it is OK for you? If so On Stackoverflow you could give [up-vote](https://stackoverflow.com/help/privileges/vote-up) to people's helpful answers to thank them and select any one of the answer as [correct answer](https://stackoverflow.com/help/someone-answers) too out of all. – Bsquare ℬℬ Nov 06 '18 at 13:33
0

I believe you have to tackle mutliple problems:

  1. The buffering of the pipe. Force line-buffering of stdout when piping to tee
  2. prints the pattern-space always followed with a <newline> character POSIX sed

The trick here is to unbuffer your pipe of , use to process the line while using printf to write to /dev/stdout with a potential flush.

This would be something like:

$ stdbuf -oL -eL wget --progress=dot:mega 'http://mysite/my_large_file' 2>&1 \
  | awk '{printf c"<%s,%s,%s/s,remain:%s>",$1,$(NF-2),$(NF-1),$NF"; c=", "}END{print ""}'

If the output of awk is to slow, you might consider to add an extra flush to it. But this is a GNU awk feature :

$ stdbuf -oL -eL wget --progress=dot:mega 'http://mysite/my_large_file' 2>&1 \
  | awk '{printf c"<%s,%s,%s/s,remain:%s>",$1,$(NF-2),$(NF-1),$NF"; c=", "; fflush()}END{print ""}'

I am not sure if you need to line buffer both /dev/stderr and /dev/stdout due to the redirection, but it does not harm to do both

kvantour
  • 25,269
  • 4
  • 47
  • 72