Grep a string from file and create variables from output in bash

Question

I have file in Linux like below.

20230512 12:42:06 INFO: job ID 
20230512 12:42:06 INFO: workflowID
20230512 12:42:06 INFO: {'jobsData': [{'jobID': '123_abc', 'jobStatus': 'RUNNING'}], 'timeoutData': {}, 'workflowID': 'bbc_999', 'workflowState': 'RUNNING'}
XXXX
YYY
ZZZ

Now from this file I want to find out the line where I have the string jobID is present.

I did like below

grep 'jobID' file_name

I get the below result

20230512 12:42:06 INFO: {'jobsData': [{'jobID': '123_abc', 'jobStatus': 'RUNNING'}], 'timeoutData': {}, 'workflowID': 'bbc_999', 'workflowState': 'RUNNING'}

Now from this I want to extract data for jobID and workflowID and store them as variables

jobID='123_abc'
workflowID='bbc_999'

How can I do that

@GillesQuénot That gets you the whole line, not just specific fields from it. — Barmar, May 12 '23 at 20:02
You can use `grep -o pattern` to output just the part of the line that matches the pattern, and assign that to a variable. — Barmar, May 12 '23 at 20:04
@Barmar It is only giving me values like `jobID` or `workflowId`. I want the value after `:` from that line as variable — nmr, May 12 '23 at 20:06
Then you didn't use the correct pattern, it has to match the value after `:`. E.g. `'jobID': '[^']*'` — Barmar, May 12 '23 at 20:07
What is generating this weird output? Can you change the behavior to have a proper JSON? — Gilles Quénot, May 12 '23 at 20:08
If you're using GNU grep you can use the `-P` option and then use a lookbehind, so that `jobID` won't be included in the result. — Barmar, May 12 '23 at 20:08
@Barmar I did like `abc = grep -o 'jobID' file_name` when I `echo $abc` the output is `jobID` — nmr, May 12 '23 at 20:08
Since your pattern is only `jobID`, of course that's all that it returns. The pattern has to include the part of the line you want. `grep` can't guess that you want something after it. — Barmar, May 12 '23 at 20:09
`sed -En '/jobID/ s/.{24}//p' file | tr "'" "\"" | jq -r '.jobsData | .[].jobID'`? — Cyrus, May 12 '23 at 20:09
It sounds like you may need to read a tutorial on regular expressions if this is confusing you. — Barmar, May 12 '23 at 20:10
@GillesQuénot: This might be more reliable with GNU awk: `awk -v FPAT="'[^']+'" '/jobID/{$1=$1; print $3}' file` — Cyrus, May 13 '23 at 00:15

Gilles Quénot · Answer 1 · 2023-05-12T22:04:21.273

With GNU grep in PCRE mode:

read jobid workflowid < <(grep -oP '(?:jobID|workflowID)\047: \047\K[^\047]+' file)
echo "$jobid"
echo "$workid"

The regular expression matches as follows:

Node	Explanation
`(?:`	group, but do not capture:
`jobID`	'jobID'
`\|`	OR
`workflowID`	'workflowID'
`)`	end of grouping
`\047`	single quote in octal
`:`	': '
`\047`	single quote in octal
`\K`	resets the start of the match (what is `K`ept) as a shorter alternative to using a look-behind assertion: look arounds and Support of \K in regex
`[^\047]+`	any character except: ' (1 or more times (matching the most amount possible))

>(command ...) or <(...) is replaced by a temporary filename. Writing or reading that file causes bytes to get piped to the command inside. Often used in combination with file redirection: cmd1 2> >(cmd2).

See http://mywiki.wooledge.org/ProcessSubstitution and
http://mywiki.wooledge.org/BashFAQ/024

score 2 · Answer 2 · answered May 12 '23 at 20:42

If awk is an option, you may be able to use command substitution to extract the values you need:

jobID=$(awk '/jobID/{gsub(/,|\047/,""); print $6}' src.file)
workflowID=$(awk '/jobID/{gsub(/,|\047/,""); print $12}' src.file)

Output:

echo "$jobID"
123_abc
echo "$workflowID"
bbc_999

ufopilot · Answer 3 · 2023-05-13T10:11:47.933

$ jobID=$(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '.jobsData[0].jobID')
$ workflowID=$(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '.workflowID')

$ echo "$jobID" 
123_abc

$ echo "$workflowID"
bbc_999

Using bash array

$ declare -A array="($(awk -F'INFO:' '/jobsData/{gsub(/\047/,"\""); print $NF}' file|jq -r '. | "[workflowID]=\(.workflowID) [jobID]=\(.jobsData[0].jobID)"'))"
$ declare -p array 
declare -A array=([jobID]="123_abc" [workflowID]="bbc_999" )

$ echo "${array[workflowID]}"
bbc_999

$ echo "${array[jobID]}"
123_abc

Ed Morton · Answer 4 · 2023-05-13T00:44:34.787

Using GNU awk and bash associative arrays:

$ declare -A arr="( $(awk -v RS="'[^']+'[ :]+'[^']*'" -F"'" '{$0=RT} NF==5{print "["$2"]=\047"$4"\047"}' file) )"

you get an array of all the 'name': 'value' pairs from the input:

$ declare -p arr
declare -A arr=([workflowID]="bbc_999" [jobStatus]="RUNNING" [jobID]="123_abc" [workflowState]="RUNNING" )

You can then set scalar variables from the array if you like:

$ jobID="${arr[jobID]}"
$ echo "$jobID"
123_abc

but you don't have to.

score 1 · Accepted Answer · edited May 12 '23 at 22:06

1

Using grep you can get the results like below. sort is used to remove duplicates from output

job_id =$(grep -o 'jobID.*' file_name | cut -f2- -d: | cut -d ',' -f 1  | sort -u)
echo "$job_id"

workflow_id=$(grep -o 'workflowID.*' file_name | cut -f2- -d: | cut -d ',' -f 1 | sort -u )
echo "$workflow_id"

edited May 12 '23 at 22:06

Gilles Quénot

173,512
41
224
223

answered May 12 '23 at 20:51

User12345

5,180
14
58
105

Grep a string from file and create variables from output in bash

5 Answers5

The regular expression matches as follows: