How to use regex and awk to detect and extract variable length and width text table?

Question

In running some scripts that update WordPress, the output of the scripts is logged to a file. Here is a relevant portion of the log file:

Downloading update from https://downloads.wordpress.org/plugin/adrotate.5.8.15.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Downloading update from https://downloads.wordpress.org/plugin/cookie-notice.2.0.0.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Downloading update from https://downloads.wordpress.org/plugin/google-site-kit.1.25.0.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Disabling Maintenance mode...
+-----------------+-------------+-------------+---------+
| name            | old_version | new_version | status  |
+-----------------+-------------+-------------+---------+
| adrotate        | 5.8.14      | 5.8.15      | Updated |
| cookie-notice   | 1.3.2       | 2.0.0       | Updated |
| google-site-kit | 1.24.0      | 1.25.0      | Updated |
+-----------------+-------------+-------------+---------+
[32;1mSuccess:[0m Updated 3 of 3 plugins.
[32;1mSuccess:[0m Theme already updated.

What I want to do next is open, read, and extract a portion of that log file to write it to a separate file as-is. The critical piece I need is this table from the output above:

+-----------------+-------------+-------------+---------+
| name            | old_version | new_version | status  |
+-----------------+-------------+-------------+---------+
| adrotate        | 5.8.14      | 5.8.15      | Updated |
| cookie-notice   | 1.3.2       | 2.0.0       | Updated |
| google-site-kit | 1.24.0      | 1.25.0      | Updated |
+-----------------+-------------+-------------+---------+

So, what I'm doing is using

awk '/Disabling Maintenance mode...$/,/[32;1mSuccess:$/' logfile.txt

to attempt to grab that table. Unfortunately, with this awk command, I seem to also get the Disabling Maintenance mode... and [32;1mSuccess: parts along with it. And those strings aren't reliably consistent enough to use them as proper start/end markers for awk. The most accurate thing I can think of is the correct regex to grab just that table and nothing more.

The problem with the text-formatted table is that the length and width of it can vary depending on what the script is updating. The "name" column could have an item in it that's 50 characters long, for example, which makes the table wider. It could also have, like, 20 "rows". So I never know how many hyphens or pipe characters to count in regex or in some kind of loop.

I've tried various tutorials and also regex101.com to devise a pattern that will help me find this variable length/width pattern. But I'm making no progress. I'm not sure I know how to frame the problem correctly within regex syntax. All tutorials I'm reading are using "abc" and "xxx" as examples and this is so much more complex.

Can anyone help me figure out how to do this?

Could you please do add more clear samples without `+----------------` lines in input and expected output. Also please do add your efforts in form of code in your question, thank you. — RavinderSingh13, Feb 03 '21 at 05:01
I'm not sure what you're asking @RavinderSingh13. Can you clarify? — user3169905, Feb 03 '21 at 05:02
First your samples are not clear, so please remove `+----------------` lines if they are NOT really present in your actual file(if present then confirm they are really present). Then add your expected output more clearly in your question. Finally important part add your efforts in your question, thank you. — RavinderSingh13, Feb 03 '21 at 05:03
Does this answer your question? [How can I align the columns of tables in Bash?](https://stackoverflow.com/questions/12768907/how-can-i-align-the-columns-of-tables-in-bash) — Akshay Hegde, Feb 03 '21 at 05:10
I've updated the question to clarify what I've tried and what I'm trying to do. @AkshayHegde, thanks for the link but that's for if I wanted to make a table from other data. The log file already has the formatted table in it. I just want to grab it with awk/regex and output it to another file. — user3169905, Feb 03 '21 at 05:32

score 3 · Answer 1 · answered Feb 03 '21 at 06:00

With your shown samples please try following. Written and tested in GNU awk.

awk '
/^\[32;1mSuccess:/      { found=""      }
/^Disabling Maintenance/{ found=1; next }
found
' Input_file

Explanation: Adding detailed explanation for above.

awk '                                       ##Starting awk program from here.
/^\[32;1mSuccess:/      { found=""      }   ##Checking if line starts from [32;1mSuccess: then unset found here.
/^Disabling Maintenance/{ found=1; next }   ##Checking if line starts from Disabling Maintenance then set found to 1 here.
found                                       ##checking condition if found is set(NOT NULL) then print that line.
' Input_file                                ##Mentioning Input_file name here.

Mischa · Accepted Answer · 2021-02-09T21:16:22.123

3

Maybe this is too simple?

awk 'bar == 3 {exit}; /--/ {bar++} bar ' logfile.txt

If you don't want the bars in the output:

awk 'bar == 3 {exit}; /--/ {bar++; next} bar' logfile.txt

edited Feb 09 '21 at 21:16

answered Feb 03 '21 at 06:25

Mischa

2,240
20
18

score 2 · Answer 3 · answered Feb 03 '21 at 08:39

I would use GNU AWK following way, let file.txt content be

Downloading update from https://downloads.wordpress.org/plugin/adrotate.5.8.15.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Downloading update from https://downloads.wordpress.org/plugin/cookie-notice.2.0.0.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Downloading update from https://downloads.wordpress.org/plugin/google-site-kit.1.25.0.zip...
Unpacking the update...
Installing the latest version...
Removing the old version of the plugin...
Plugin updated successfully.
Disabling Maintenance mode...
+-----------------+-------------+-------------+---------+
| name            | old_version | new_version | status  |
+-----------------+-------------+-------------+---------+
| adrotate        | 5.8.14      | 5.8.15      | Updated |
| cookie-notice   | 1.3.2       | 2.0.0       | Updated |
| google-site-kit | 1.24.0      | 1.25.0      | Updated |
+-----------------+-------------+-------------+---------+
[32;1mSuccess:[0m Updated 3 of 3 plugins.
[32;1mSuccess:[0m Theme already updated.

then

awk '/^[+|].*[+|]$/' file.txt

output

+-----------------+-------------+-------------+---------+
| name            | old_version | new_version | status  |
+-----------------+-------------+-------------+---------+
| adrotate        | 5.8.14      | 5.8.15      | Updated |
| cookie-notice   | 1.3.2       | 2.0.0       | Updated |
| google-site-kit | 1.24.0      | 1.25.0      | Updated |
+-----------------+-------------+-------------+---------+

Explanation: print only lines which begin with one of: +| and end with one of: +|. Note that this might give false positives if you have any non-table lines starting with + or | and ending with + or |, so I suggest you run further test with your input data if you wish to use my solution.

Thanks. This looks great and makes sense according to the explanation you've given. However, what's weird is that when I run the awk command nothing happens. In contrast, for the answer from Mischa below, his example works. I wonder if it's a syntax thing? — user3169905, Feb 04 '21 at 15:37

How to use regex and awk to detect and extract variable length and width text table?

3 Answers3