Tokenize a string with awk in bash script

Question

I want to read lines of a file line by line, tokinize every line and do some processings. The file structure and script look like

[root@localhost:~] cat vms.txt
mahmood May 24
[root@localhost:~] cat power_offs2.sh
#!/bin/bash
INPUT=/vms.txt
while IFS= read -r line; do
    echo "$line" | awk '{split($0,a,"|"); print a[1],a[2],a[3]}'
    NAME=a[1]
    NEW_MONTH=a[2]
    NEW_DAY=a[3]
    echo $NAME "-" $NEW_MONTH "-" $NEW_DAY
done < $INPUT

The output however is

[root@localhost:~] sh /power_offs2.sh
mahmood May 24
a[1] - a[2] - a[3]

It seems that array is defined in the awk scope. How can I fix that?

Can you clarify expected output. The input file is space delimited, but the `awk` is splitting on vertical bar ('|'). Which one you want ? — dash-o, May 23 '20 at 17:54
My fault... There are spaces between words and '|' is wrong. Even if I change that to `$0,a," "`, I don't see the words in the output. — mahmood, May 23 '20 at 18:43

dash-o · Answer 1 · 2020-05-24T04:06:44.827

2

From the input, looks like the text is space delimited. You can use bash read for that

while read name month dd ; do
    printf '%s-%s-%s\n' "$name" "$month" "$dd"
done < $INPUT

Output:

mahmood-May-24

edited May 24 '20 at 04:06

answered May 23 '20 at 17:57

dash-o

13,723
1
10
37

You missed the input file variable. No? – mahmood May 23 '20 at 18:47
@mahmood, Updated answer – dash-o May 24 '20 at 04:07

score 1 · Accepted Answer · answered May 23 '20 at 17:55

1

quick fix:

NAME=$(awk '{print $1}' vms.txt)
NEW_MONTH=$(awk '{print $2}' vms.txt)
NEW_DAY=$(awk '{print $3}' vms.txt)
echo $NAME "-" $NEW_MONTH "-" $NEW_DAY

answered May 23 '20 at 17:55

Kent

189,393
32
233
301

OK I fixed it with `NAME=`echo $line | awk '{print $1}'`` and so on in the loop. – mahmood May 23 '20 at 19:12

Freddy · Answer 3 · 2020-05-23T22:36:53.920

With awk:

$ awk -v OFS="-" '{$1=$1}1' vms.txt
mahmood-May-24

-v OFS="-" sets the output field separator to - (default is a space character)
$1=$1 rebuilds the record according to the modified output field separator OFS. This is described here (bottom of the page) in the GNU Awk manual.
1 is a pattern without an action and evaluates to true. This is a common shorthand to trigger the default action to print the current record (same as {print} or {print $0}, see Why does “1” in awk print the current line?).

markp-fuso · Answer 4 · 2020-05-29T12:25:37.823

As the OP's determined, the array (a[]) has scope within the awk call and therefore the array cannot be referenced by the calling process.

Assumptions:

input data is delimited by white space
we're only dealing with a single line of input
the parsed values need to be accessible by the parent script

We can eliminate the need for awk, as well as any subprocessing (eg, echo | awk, $(awk ...)) and use a simple read command:

$ cat vms.txt
mahmood May 24

$ read -r NAME NEW_MONTH NEW_DAY < vms.txt

$ echo $NAME "-" $NEW_MONTH "-" $NEW_DAY
mahmood - May - 24

Based on a comment by the OP about using $line, I'll remove the assumption that we're only dealing with a single line of interest in vms.txt ...

Using the same method as above but replacing vms.txt as input with $line as input (in the form of a 'here' string):

$ line='mahmood May 24'

$ read -r NAME NEW_MONTH NEW_DAY <<< "${line}"

$ echo $NAME "-" $NEW_MONTH "-" $NEW_DAY
mahmood - May - 24

Again, this should eliminate subprocessing calls and be a bit faster if dealing with a large volume of $lines.

Léa Gris · Answer 5 · 2020-05-23T18:55:26.530

0

Something like tr ' ' '-' <vms.txt

Bash-only if you want to tokenise into an array:

#!/usr/bin/env bash

_OIFS="$IFS"
IFS=' ' read -r -d '' -a array <vms.txt
IFS='-'
echo "${array[*]}"
IFS="$_OIFS"

See: How can I join elements of an array in Bash?

edited May 23 '20 at 18:55

answered May 23 '20 at 18:47

Léa Gris

17,497
4
32
41

Tokenize a string with awk in bash script

5 Answers5