3

I was running a program, and it will output the progress bar. I did it like this

python train.py |& tee train.log

The train.log looks like the following.

This is line 1

Training ...

This is line 2

...
[000] valid: 100%|█████████████████████████████████████████████████████████████▉| 2630/2631 [15:24<00:00,  2.98 track/s]
[000] valid: 100%|██████████████████████████████████████████████████████████████| 2631/2631 [15:25<00:00,  3.02 track/s]                                                                                                              
Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days

This is line 3 ...

[001] valid: 100%|█████████████████████████████████████████████████████████████▉| 2629/2631 [15:11<00:00,  2.90 
[001] valid: 100%|█████████████████████████████████████████████████████████████▉| 2630/2631 [15:11<00:00,  2.89 
[001] valid: 100%|██████████████████████████████████████████████████████████████| 2631/2631 [15:12<00:00,  2.88                                                                                                   
Epoch 001: train=0.10971066 valid=0.09931737 best=0.0993 duration=0.79 days

On the terminal, they are supposed to be viewed as replacing itself, hence in the log file, there are alot of repetitions. So when I did wc -l train.log, it only returned 3 lines. However when I opened this 5MB text file in the text editor, there are like 20000 lines.

My objective is to only get these details:

Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days    
Epoch 001: train=0.10971066 valid=0.09931737 best=0.0993 duration=0.79 days

My questions are:

  1. How do I, without stopping my current training progress, extract my desired details from the suppposedly "3" lines of train.log? Keep in mind that this training will be continuously done for 10 more epochs, so I don't want to open the whole junk of progress bar in the editor.

  2. In the future, how should I store my log file (instead of calling python train.py |& tee train.log) such that while I can see the progress bar in the terminal, I only keep the important information in a text file?

Edit 1 : Here's a link to the file train.log

leonardltk1
  • 257
  • 1
  • 4
  • 18

2 Answers2

1

The progress bars are probably written to stderr, which you send to tee together with stdout by using |&.

To write only stdout to the file, use the normal pipe | instead.


The progress bar was generated by writing one line and then a carriage return character (\r) but no newline character (\n). To fix that and to be able to process the file further, you can use for example sed 's/\r/\n/g'.

The following works with the file linked in the question:

$ sed 's/\r/\n/g' train.log | grep Epoch
Epoch 000: train=0.11940351 valid=0.10640465 best=0.1064 duration=0.79 days
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • I see, in that case, you have answer question 2. However, my current concern is question 1, as i need to access the log file and extract the summary of the end of the progress. I tried to perform ```cat train.log > tmp```, but there are nothing shown in the terminal. which means the progress bars are not stored as stderr in the log file right? – leonardltk1 Mar 23 '20 at 09:10
  • Of course `cat train.log > tmp` does not produce output on the terminal, as all output is redirected to a file `tmp`. You have effectively created a copy of `train.log`. – mkrieger1 Mar 23 '20 at 10:06
  • To filter the contents of a file, you can use `grep`, for example `grep Epoch train.log`. – mkrieger1 Mar 23 '20 at 10:08
  • Yes that that is the problem right now. As you can see, the word "duration" does not exists in the progress bar in line 2 and line 3. So even when i do this ```grep "duration" train.log > tmp ```, there will still be the progress bar shown in the `tmp` file. I also tried ```tail -n1 train.log > tmp ```. It is still the same, they still show all the progress bar, and the final line in the tmp file. it doesn't only show the final line. Let me see if i can direct you to download the file, and see if you can help me from there. – leonardltk1 Mar 23 '20 at 13:28
  • I have edited my question, and included the link. If you were to do ```wc -l train.log``` : you should only see ```3 train.log```. This means it is only recognised as having 3 lines. but when u were to double click and open it in your editor, you can see that there are 12734 lines. grepping doesnt work, ```sed "/]/d" train.log``` doesn't work too. – leonardltk1 Mar 23 '20 at 13:44
  • Hi, i have a new file that i added, in **Edit 2**. It seems the the way this progress bar is written is different, not using `\r` , hence using `sed` doesn't work for it. Do you know any work around for this ? – leonardltk1 Apr 01 '20 at 05:00
  • Seems like a new problem. Please ask an entirely new question. – mkrieger1 Apr 01 '20 at 09:26
  • Hi, i have asked a new question, at [link](https://stackoverflow.com/questions/60981316/reading-a-training-progress-log-file-but-binaries-are-written-in-it) , thanks ! – leonardltk1 Apr 01 '20 at 22:28
  • But why did you write the progress bar to the file again? – mkrieger1 Apr 02 '20 at 10:27
0

Ok, I solved it already.

According to this question,

You make a progress bar by doing echo -ne "your text \r" > log.file.

So because some editor that i used (Notepad, sublime text 3) recognise \r as a line breaker, you see them as seperate line, but in actual fact they are stored in single line.

So to reverse engineer it, you can make them into actual line breakers sed -i "s,\r,\n,g" train.log, and the grep accoringly.

Anyhoo, thanks @mkrieger1 for helping me out anyway !

leonardltk1
  • 257
  • 1
  • 4
  • 18