0

Context

This question is not related to any particular programming language, but how stdout works when we write to a terminal vs when we write to a file. Anyways, to demonstrate, I'll have to pick a language, and I choose Python for the problem part.

I've stolen code below from this answer:

Save this code as progress.py:

def progressBar(iterable, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    total = len(iterable)
    # Progress Bar Printing Function
    def printProgressBar (iteration):
        percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
        filledLength = int(length * iteration // total)
        bar = fill * filledLength + '-' * (length - filledLength)
        print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Initial Call
    printProgressBar(0)
    # Update Progress Bar
    for i, item in enumerate(iterable):
        yield item
        printProgressBar(i + 1)
    # Print New Line on Complete
    print()


import time

# A List of Items
items = list(range(0, 57))

# A Nicer, Single-Call Usage
for item in progressBar(items, prefix = 'Progress:', suffix = 'Complete', length = 50):
    # Do stuff...
    time.sleep(0.1)

When you run this program using python3 progress.py, you'd see a progress bar going from left to right.

I have attached a screenshot:

Progress bar going from left to right on a single line

If you are following this on your own, you'd notice that the progress updates on the same line, i.e. it does not writes a new line for each step.

Now try to route the output of the script to file. Do a python3 progress.py > stdout.log.

If you cat the stdout.log at this point, terminal will interpret it correctly and show the last output, which is 100.0% complete output.

Now if open the output file, you'll find something else. Here I've pasted the content:

^MProgress: |--------------------------------------------------| 0.0% Complete^MProgress: |--------------------------------------------------| 1.8% Complete^MProgress: |█-------------------------------------------------| 3.5% Complete^MProgress: |██------------------------------------------------| 5.3% Complete^MProgress: |███-----------------------------------------------| 7.0% Complete^MProgress: |████----------------------------------------------| 8.8% Complete^MProgress: |█████---------------------------------------------| 10.5% Complete^MProgress: |██████--------------------------------------------| 12.3% Complete^MProgress: |███████-------------------------------------------| 14.0% Complete^MProgress: |███████-------------------------------------------| 15.8% Complete^MProgress: |████████------------------------------------------| 17.5% Complete^MProgress: |█████████-----------------------------------------| 19.3% Complete^MProgress: |██████████----------------------------------------| 21.1% Complete^MProgress: |███████████---------------------------------------| 22.8% Complete^MProgress: |████████████--------------------------------------| 24.6% Complete^MProgress: |█████████████-------------------------------------| 26.3% Complete^MProgress: |██████████████------------------------------------| 28.1% Complete^MProgress: |██████████████------------------------------------| 29.8% Complete^MProgress: |███████████████-----------------------------------| 31.6% Complete^MProgress: |████████████████----------------------------------| 33.3% Complete^MP

The actual problem

I am using Docker SDK and Go programming language to get output from a container and post it to Gist. The output of the docker logs involves one such progress bar.

Here is link to one such log I post to Gist: https://gist.github.com/avimanyu786/040243ee1f9a260677080a69ffb88d59

I understand that on terminal, the terminal interprets the control character and re-writes the line. When we write it to a file, it shows the whole thing, as we seen in the gist.

My solution in theory

If we take example of our gist output, we see the line 10 actually takes more than one visual line (logically still 1 line).

We also know that each of those visual lines ends with a control character, which on Gist is rendered as square blocks.

Before sending the output to the gist, I want to:

  1. Receive logs as bytes.Buffer (I can convert it to bytes or string if needed).
  2. Iterate over all the lines.
  3. If there is any control character in the line, delete from start of the line the last control character on that logical line.

What this is going to do is to only show the last update of that line.

I'm not sure how to do this. Will regex work here? I have not dealt with control characters before. How do I delete from start of the line to last control character?

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118
  • 1
    What is the problem? Just implement it like you described it. You seem to have a good understanding of everything required to solve this problem. – Thomas Weller Nov 09 '22 at 13:41
  • 1
    I don't really see a question here, other than "Will regex work here?" Have you tried it? You could answer this yourself faster by trying than by writing your question. – Jonathan Hall Nov 09 '22 at 13:42
  • I have not done this before. What does control characters looks like? What will I look for in the bytes buffer? On GitHub it shows as square blocks. How do delete from start of a line to last occurrence of an unknown character? – Santosh Kumar Nov 09 '22 at 13:47
  • Control characters basically work like any other characters: See here https://stackoverflow.com/questions/4324790/removing-control-characters-from-a-string-in-python You probably want `\r` – Thomas Weller Nov 09 '22 at 15:59

1 Answers1

0

I found an answer to my question.

So docker spits the log output in io.ReadCloser, that output can be written to a bytes.Buffer:

var stdout bytes.Buffer
var stderr bytes.Buffer

containerLog := GetLogs(containerID)
stdcopy.StdCopy(&stdout, &stderr, containerLog)

Here is code for GetLogs anyway:

// GetLogs return logs from the container io.ReadCloser. It's the caller duty
// duty to do a stdcopy.StdCopy. Any other method might render unknown
// unicode character as log output has both stdout and stderr. That starting
// has info if that line is stderr or stdout.
func GetLogs(contName string) (logOutput io.ReadCloser) {
    options := types.ContainerLogsOptions{ShowStdout: true, ShowStderr: true}

    out, err := dc.ContainerLogs(ctx, contName, options)
    if err != nil {
        panic(err)
    }

    return out
}

Before sending it to GitHub's API which accepts string, we can get rid of all the content before last \r in a line:

// cleanFlushInfo takes in bytes.Buffer from docker logs output and for each line
// if it has a \r in the lines, takes the last one and compose another string
// out of that.
func cleanFlushInfo(bytesBuffer *bytes.Buffer) string {
    scanner := bufio.NewScanner(bytesBuffer)
    finalString := ""

    for scanner.Scan() {
        line := scanner.Text()
        chunks := strings.Split(line, "\r")
        lastChunk := chunks[len(chunks)-1] // fetch the last update of the line
        finalString += lastChunk + "\n"
    }

    return finalString
}

Why is it not efficient? As the time passes, the logs will grow longer. The work that program has to do to remove unwanted information will grow as well.

What can be done to overcome problem is the only fetch N minutes of output from the container. Have them listed on Gist, either by listing many time-chunk based files, or overwriting the files (gist will still retain older information).

Santosh Kumar
  • 26,475
  • 20
  • 67
  • 118