54

If I have a string variable who's value is "john is 17 years old" how do I tokenize this using spaces as the delimeter? Would I use awk?

Jake Wilson
  • 88,616
  • 93
  • 252
  • 370

5 Answers5

77
$ string="john is 17 years old"
$ tokens=( $string )
$ echo ${tokens[*]}

For other delimiters, like ';'

$ string="john;is;17;years;old"
$ OLDIFS="$IFS"
$ IFS=';' tokens=( $string )
$ echo ${tokens[*]}
$ IFS="$OLDIFS" # restore IFS
Diego Torres Milano
  • 65,697
  • 9
  • 111
  • 134
  • Very nice, feels much more like an array. – Adam Eberlin Dec 21 '13 at 21:35
  • echo ${tokens[*]} doesn't work for me I get 'bash: ${tokens[*}: bad substitution ' error. – JPM Mar 11 '20 at 16:04
  • you are missing the `*`: ```$ tokens=( a ); $ echo ${tokens[]}; -bash: ${tokens[]}: bad substitution $ echo ${tokens[*]}; a``` – Diego Torres Milano Mar 11 '20 at 21:15
  • 1
    changing `IFS` and then building array this way makes `IFS` assignment "permanent", not just for the duration of the array building. see https://stackoverflow.com/questions/62855752/bash-ifs-stuck-after-temporarily-changing-it-for-array-building – morgwai Jan 27 '22 at 18:20
  • Your code changes `IFS`. I've spent 1h figuring out why my script fails. You need to add `IFS=$' \t\n' # set IFS to the default, works with zsh, ksh, bash.`. More [info](https://unix.stackexchange.com/a/220658/334715). – pmor Oct 28 '22 at 17:31
  • FYI: From [here](https://unix.stackexchange.com/a/459603/334715): "The basic `old_IFS="${IFS}"; command; IFS="${old_IFS}"` approach that touches the global IFS will work as expected for the simplest of scripts. However, as soon as you add any complexity, it can easily break apart and cause subtle issues". – pmor Nov 02 '22 at 12:54
  • I finally used [this](https://stackoverflow.com/a/918931/1778275) approach: `IFS=';' read -ra tokens <<< "$string"`. As I understand, here the IFS has value `;` only within the duration of the read command. Is that correct? – pmor Nov 02 '22 at 12:55
71

Use the shell's automatic tokenization of unquoted variables:

$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old

If you want to change the delimiter you can set the $IFS variable, which stands for internal field separator. The default value of $IFS is " \t\n" (space, tab, newline).

$ string="john_is_17_years_old"
$ (IFS='_'; for word in $string; do echo "$word"; done)
john
is
17
years
old

(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS doesn't persist. You generally don't want to permanently change $IFS as it can wreak havoc on unsuspecting shell commands.)

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • for your examples, how would you re-use the third token (17) for example? use the for loop and count tokens? – kurumi Mar 22 '11 at 07:31
  • 1
    @Allen, then i can do this `IFS="_";set -- $string; echo $2.` or directly set it to an array like what `dtmilano` did. There is no need to use a for loop isn't it? – kurumi Mar 24 '11 at 05:40
14
$ string="john is 17 years old"
$ set -- $string
$ echo $1
john
$ echo $2
is
$ echo $3
17
kurumi
  • 25,121
  • 5
  • 44
  • 52
2

you can try something like this :

#!/bin/bash
n=0
a=/home/file.txt
for i in `cat ${a} | tr ' ' '\n'` ; do
   str=${str},${i}
   let n=$n+1
   var=`echo "var${n}"`
   echo $var is ... ${i}
done
harshit
  • 7,925
  • 23
  • 70
  • 97
  • The use of `tr` makes this the best solution. Your exemple code could be much simpler : `echo john is 17 years old | tr ' ' '\n'` – Titou May 11 '17 at 08:49
1

with POSIX extended regex:

$ str='a b     c d'
$ echo "$str" | sed -E 's/\W+/\n/g' | hexdump -C
00000000  61 0a 62 0a 63 0a 64 0a                           |a.b.c.d.|
00000008

this is like python's re.split(r'\W+', str)

\W matches a non-word character,
including space, tab, newline, return, [like the bash for tokenizer]
but also including symbols like quotes, brackets, signs, ...

... except the underscore sign _,
so snake_case is one word, but kebab-case are two words.

leading and trailing space will create an empty line.

milahu
  • 2,447
  • 1
  • 18
  • 25