Is there a linux command that can cut and pick columns that match string patterns?

Question

I need to analyze logs and my end user has to be able to see them in a formatted way, as mentioned below, and my nature of logs is the key variables will be in different position, rather than at fixed columns based on the application, as these log formats are from various applications.

"thread":"t1","key1":"value1","key2":"value2",......"key15":"value15"

I have a way to split and cut this to analyze only particular keys, using the following,

cat file.txt | grep 'value1' | cut -d',' -f2,7,8-

This is the command I am able to get, the requirement is I need to grep all logs which have 'key1' as 'value1', and this value1 will be most likely unique among all, so I am using a grep directly, if required, I can use grep to pick along with the key and value string, but main problem I am facing, is the part is after cut. I want to pick only key2, key7, key8 among these lines, but key2, key7, key8 might not appear in the same column numbers like in this order, key2 might even be at column 3 or 4 or after key7/key8, so I want pick based on the key value and get exactly

"key2":"value2", "key7":"value7", "key8:value8"

The end user is not particularly picky about the order in which they appear, they need only these keys from each line to be displayed.. Can someone help me? I tried piping with awk / grep again, but they still match the entire line not on the columns alone

My input is

{"@timestamp":"2021-08-05T06:38:48.084Z","level":"INFO","thread":"main","logger":"className1","message":"Message 1"} {"@timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"main","logger":"className2","message":"Message 2"} {"@timestamp":"2021-08-05T06:38:48.092Z","level":"DEBUG","thread":"thead1","logger":"className2","message":"Message 2"}

I basically want my output to be more like, find only the "thread":"main" lines and print only the key and values of "logger" and "message" for each line which matched, since the other key and value are irrelevant to me. there is more than 15 to 16 keys in my file and my key positions could be swapped, like "message" could be the first to appear and "logger" could be the second to appear in some log files. Of course, the keys are just an example, the real keys I am trying to find are not "logger" and "message" alone.

There are log analysis tools, but this is a pretty old system, and the logs are not real time ones I am analyzing and displaying some files which are pretty much older than years.

What you are describing sounds like Awk, but your exposition isn't really very easy to follow. Could you please [edit] with a few lines of examples, and expected output? Take care to include examples of corner cases, like where output should _not_ be printed for a particular input. — tripleee, Jul 28 '21 at 04:58
@demi365: You can ask `awk` to match a certain column only, but you need to tell it the character which separates the column. Alternatively, you can use `cut` to first select the respective column, and then match the result with `grep`, but in this case, the information of the rest of the line is gone. In your case however, both approaches would fail, if either one of the keys or one of the values would also contain the field separator. — user1934428, Jul 28 '21 at 05:13

Renaud Pacalet · Accepted Answer · 2021-07-28T08:47:21.620

Not sure I really understand your specification but the following awk script could be a starting point:

$ cat foo.awk
BEGIN {
  k["\"key1\""] = 1; k["\"key7\""] = 1; k["\"key8\""] = 1;
}
/"key1":"value1"/ {
  s = "";
  for(i = 1; i <= NF; i+=2)
    if($i in k)
      s = s (s ? "," : "") $i ":" $(i+1);
  print s;
}
$ awk -F',|:' -f foo.awk foo.txt
"key1":"value1","key7":"value7","key8":"value8"

Explanation:

awk is called with the -F',|:' option such that the fields separator in each record is the comma or the colon.
In the BEGIN section we declare an associative array (k) of the selected keys, including the surrounding double quotes.
The rest of the awk script applies to each record containing "key1":"value1".
- Variable s is used to prepare the output string; it is initialized to "".
- For each odd field (the keys) we check if it is in k. If it is, we concatenate to s:
  - a comma if s is not empty,
  - the key field,
  - a colon,
  - the following even field (the value).
- We print s.

Thanks a lot mate, This more or less sums up my requirement by changing the inputs a bit and adding more params from command line, I was able to achieve what I want from your solution — demi365, Jul 28 '21 at 09:23

Is there a linux command that can cut and pick columns that match string patterns?

1 Answers1