If I have a string variable who's value is "john is 17 years old"
how do I tokenize this using spaces as the delimeter? Would I use awk
?

- 88,616
- 93
- 252
- 370
5 Answers
$ string="john is 17 years old"
$ tokens=( $string )
$ echo ${tokens[*]}
For other delimiters, like ';'
$ string="john;is;17;years;old"
$ OLDIFS="$IFS"
$ IFS=';' tokens=( $string )
$ echo ${tokens[*]}
$ IFS="$OLDIFS" # restore IFS

- 65,697
- 9
- 111
- 134
-
Very nice, feels much more like an array. – Adam Eberlin Dec 21 '13 at 21:35
-
echo ${tokens[*]} doesn't work for me I get 'bash: ${tokens[*}: bad substitution ' error. – JPM Mar 11 '20 at 16:04
-
you are missing the `*`: ```$ tokens=( a ); $ echo ${tokens[]}; -bash: ${tokens[]}: bad substitution $ echo ${tokens[*]}; a``` – Diego Torres Milano Mar 11 '20 at 21:15
-
1changing `IFS` and then building array this way makes `IFS` assignment "permanent", not just for the duration of the array building. see https://stackoverflow.com/questions/62855752/bash-ifs-stuck-after-temporarily-changing-it-for-array-building – morgwai Jan 27 '22 at 18:20
-
Your code changes `IFS`. I've spent 1h figuring out why my script fails. You need to add `IFS=$' \t\n' # set IFS to the default, works with zsh, ksh, bash.`. More [info](https://unix.stackexchange.com/a/220658/334715). – pmor Oct 28 '22 at 17:31
-
FYI: From [here](https://unix.stackexchange.com/a/459603/334715): "The basic `old_IFS="${IFS}"; command; IFS="${old_IFS}"` approach that touches the global IFS will work as expected for the simplest of scripts. However, as soon as you add any complexity, it can easily break apart and cause subtle issues". – pmor Nov 02 '22 at 12:54
-
I finally used [this](https://stackoverflow.com/a/918931/1778275) approach: `IFS=';' read -ra tokens <<< "$string"`. As I understand, here the IFS has value `;` only within the duration of the read command. Is that correct? – pmor Nov 02 '22 at 12:55
Use the shell's automatic tokenization of unquoted variables:
$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old
If you want to change the delimiter you can set the $IFS
variable, which stands for internal field separator. The default value of $IFS
is " \t\n"
(space, tab, newline).
$ string="john_is_17_years_old"
$ (IFS='_'; for word in $string; do echo "$word"; done)
john
is
17
years
old
(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS
doesn't persist. You generally don't want to permanently change $IFS
as it can wreak havoc on unsuspecting shell commands.)

- 349,597
- 67
- 533
- 578
-
for your examples, how would you re-use the third token (17) for example? use the for loop and count tokens? – kurumi Mar 22 '11 at 07:31
-
1@Allen, then i can do this `IFS="_";set -- $string; echo $2.` or directly set it to an array like what `dtmilano` did. There is no need to use a for loop isn't it? – kurumi Mar 24 '11 at 05:40
$ string="john is 17 years old"
$ set -- $string
$ echo $1
john
$ echo $2
is
$ echo $3
17

- 25,121
- 5
- 44
- 52
you can try something like this :
#!/bin/bash
n=0
a=/home/file.txt
for i in `cat ${a} | tr ' ' '\n'` ; do
str=${str},${i}
let n=$n+1
var=`echo "var${n}"`
echo $var is ... ${i}
done

- 7,925
- 23
- 70
- 97
-
The use of `tr` makes this the best solution. Your exemple code could be much simpler : `echo john is 17 years old | tr ' ' '\n'` – Titou May 11 '17 at 08:49
with POSIX extended regex:
$ str='a b c d'
$ echo "$str" | sed -E 's/\W+/\n/g' | hexdump -C
00000000 61 0a 62 0a 63 0a 64 0a |a.b.c.d.|
00000008
this is like python's re.split(r'\W+', str)
\W
matches a non-word character,
including space, tab, newline, return, [like the bash for
tokenizer]
but also including symbols like quotes, brackets, signs, ...
... except the underscore sign _
,
so snake_case
is one word, but kebab-case
are two words.
leading and trailing space will create an empty line.

- 2,447
- 1
- 18
- 25