0

I have file in Linux like below.

20230512 12:42:06 INFO: job ID 
20230512 12:42:06 INFO: workflowID
20230512 12:42:06 INFO: {'jobsData': [{'jobID': '123_abc', 'jobStatus': 'RUNNING'}], 'timeoutData': {}, 'workflowID': 'bbc_999', 'workflowState': 'RUNNING'}
XXXX
YYY
ZZZ

Now from this file I want to find out the line where I have the string jobID is present.

I did like below

grep 'jobID' file_name

I get the below result

20230512 12:42:06 INFO: {'jobsData': [{'jobID': '123_abc', 'jobStatus': 'RUNNING'}], 'timeoutData': {}, 'workflowID': 'bbc_999', 'workflowState': 'RUNNING'}

Now from this I want to extract data for jobID and workflowID and store them as variables

jobID='123_abc'
workflowID='bbc_999'

How can I do that

nmr
  • 605
  • 6
  • 20
  • @GillesQuénot That gets you the whole line, not just specific fields from it. – Barmar May 12 '23 at 20:02
  • You can use `grep -o pattern` to output just the part of the line that matches the pattern, and assign that to a variable. – Barmar May 12 '23 at 20:04
  • @Barmar It is only giving me values like `jobID` or `workflowId`. I want the value after `:` from that line as variable – nmr May 12 '23 at 20:06
  • Then you didn't use the correct pattern, it has to match the value after `:`. E.g. `'jobID': '[^']*'` – Barmar May 12 '23 at 20:07
  • What is generating this weird output? Can you change the behavior to have a proper JSON? – Gilles Quénot May 12 '23 at 20:08
  • If you're using GNU grep you can use the `-P` option and then use a lookbehind, so that `jobID` won't be included in the result. – Barmar May 12 '23 at 20:08
  • @Barmar I did like `abc = grep -o 'jobID' file_name` when I `echo $abc` the output is `jobID` – nmr May 12 '23 at 20:08
  • Since your pattern is only `jobID`, of course that's all that it returns. The pattern has to include the part of the line you want. `grep` can't guess that you want something after it. – Barmar May 12 '23 at 20:09
  • 3
    `sed -En '/jobID/ s/.{24}//p' file | tr "'" "\"" | jq -r '.jobsData | .[].jobID'`? – Cyrus May 12 '23 at 20:09
  • It sounds like you may need to read a tutorial on regular expressions if this is confusing you. – Barmar May 12 '23 at 20:10
  • Not very reliable Cyrus IMHO – Gilles Quénot May 12 '23 at 20:40
  • @GillesQuénot: This might be more reliable with GNU awk: `awk -v FPAT="'[^']+'" '/jobID/{$1=$1; print $3}' file` – Cyrus May 13 '23 at 00:15
  • I agree, will be far better and smart. – Gilles Quénot May 13 '23 at 00:30

5 Answers5

2

With GNU grep in PCRE mode:

read jobid workflowid < <(grep -oP '(?:jobID|workflowID)\047: \047\K[^\047]+' file)
echo "$jobid"
echo "$workid"

The regular expression matches as follows:

Node Explanation
(?: group, but do not capture:
jobID 'jobID'
| OR
workflowID 'workflowID'
) end of grouping
\047 single quote in octal
: ': '
\047 single quote in octal
\K resets the start of the match (what is Kept) as a shorter alternative to using a look-behind assertion: look arounds and Support of \K in regex
[^\047]+ any character except: ' (1 or more times (matching the most amount possible))

>(command ...) or <(...) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection: cmd1 2> >(cmd2).

See http://mywiki.wooledge.org/ProcessSubstitution and
http://mywiki.wooledge.org/BashFAQ/024

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
2

If awk is an option, you may be able to use command substitution to extract the values you need:

jobID=$(awk '/jobID/{gsub(/,|\047/,""); print $6}' src.file)
workflowID=$(awk '/jobID/{gsub(/,|\047/,""); print $12}' src.file)

Output:

echo "$jobID"
123_abc
echo "$workflowID"
bbc_999
j_b
  • 1,975
  • 3
  • 8
  • 14
2
$ jobID=$(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '.jobsData[0].jobID')
$ workflowID=$(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '.workflowID')

$ echo "$jobID" 
123_abc

$ echo "$workflowID"
bbc_999

Using bash array

$ declare -A array="($(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '. | "[workflowID]=\(.workflowID) [jobID]=\(.jobsData[0].jobID)"'))"
$ declare -p array 
declare -A array=([jobID]="123_abc" [workflowID]="bbc_999" )

$ echo "${array[workflowID]}"
bbc_999

$ echo "${array[jobID]}"
123_abc
ufopilot
  • 3,269
  • 2
  • 10
  • 12
2

Using GNU awk and bash associative arrays:

$ declare -A arr="( $(awk -v RS="'[^']+'[ :]+'[^']*'" -F"'" '{$0=RT} NF==5{print "["$2"]=\047"$4"\047"}' file) )"

you get an array of all the 'name': 'value' pairs from the input:

$ declare -p arr
declare -A arr=([workflowID]="bbc_999" [jobStatus]="RUNNING" [jobID]="123_abc" [workflowState]="RUNNING" )

You can then set scalar variables from the array if you like:

$ jobID="${arr[jobID]}"
$ echo "$jobID"
123_abc

but you don't have to.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Using grep you can get the results like below. sort is used to remove duplicates from output

job_id =$(grep -o 'jobID.*' file_name | cut -f2- -d: | cut -d ',' -f 1  | sort -u)
echo "$job_id"

workflow_id=$(grep -o 'workflowID.*' file_name | cut -f2- -d: | cut -d ',' -f 1 | sort -u )
echo "$workflow_id"
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
User12345
  • 5,180
  • 14
  • 58
  • 105