339

I have a string containing many words with at least one space between each two. How can I split the string into individual words so I can loop through them?

The string is passed as an argument. E.g. ${2} == "cat cat file". How can I loop through it?

Also, how can I check if a string contains spaces?

Zaz
  • 46,476
  • 14
  • 84
  • 101
derrdji
  • 12,661
  • 21
  • 68
  • 78

11 Answers11

438

I like the conversion to an array, to be able to access individual elements:

sentence="this is a story"
stringarray=($sentence)

now you can access individual elements directly (it starts with 0):

echo ${stringarray[0]}

or convert back to string in order to loop:

for i in "${stringarray[@]}"
do
  :
  # do whatever on $i
done

Of course looping through the string directly was answered before, but that answer had the the disadvantage to not keep track of the individual elements for later use:

for i in $sentence
do
  :
  # do whatever on $i
done

See also Bash Array Reference.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Highwind
  • 4,429
  • 1
  • 12
  • 2
  • 37
    Sadly not quite perfect, because of shell-globbing: `touch NOPE; var='* a *'; arr=($var); set | grep ^arr=` outputs `arr=([0]="NOPE" [1]="a" [2]="NOPE")` instead of the expected `arr=([0]="*" [1]="a" [2]="*")` – Tino May 13 '15 at 10:48
  • 3
    @Tino: if you do not want globbing to interfere then simply turn it off. The solution will then work fine with wildcards as well. It is the best approach in my opinion. – Alexandros Dec 17 '18 at 13:19
  • 6
    @Alexandros My approach is to only use patterns, which are secure by-default and working in every context perfectly. A requirement to change shell-globbing to get a secure solution is more than just a very dangerous path, it's already the dark side. So my advice is to never get accustomed to use pattern like this here, because sooner or later you will forget about some detail, and then somebody exploits your bug. You can find proof for such exploits in the press. Every. Single. Day. – Tino Dec 18 '18 at 15:36
  • Additionally, it doesn't seem to respect escaped spaces. "abc\ def" is not a single element. – ktb Aug 14 '22 at 22:52
348

Did you try just passing the string variable to a for loop? Bash, for one, will split on whitespace automatically.

sentence="This is   a sentence."
for word in $sentence
do
    echo $word
done

 

This
is
a
sentence.
mob
  • 117,087
  • 18
  • 149
  • 283
  • 1
    @MobRule - the only drawback of this is that you can not easily capture (at least I don't recall of a way) the output for further processing. See my "tr" solution below for something that sends stuff to STDOUT – DVK Sep 24 '09 at 20:04
  • 4
    You could just append it to a variable: `A=${A}${word})`. – Lucas Jones Sep 24 '09 at 20:11
  • 1
    set $text [this will put the words into $1,$2,$3...etc] – Rajeshkumar Apr 09 '14 at 02:40
  • 47
    Actually this trick is not only a wrong solution, it also is **extremely dangerous** due to shell globbing. `touch NOPE; var='* a *'; for a in $var; do echo "[$a]"; done` outputs `[NOPE] [a] [NOPE]` instead of the expected `[*] [a] [*]` (LFs replaced by SPC for readability). – Tino May 13 '15 at 09:55
  • @mob what should i do if i want to split the string based on some specific string? example **".xlsx"** separator . –  Aug 14 '18 at 06:23
  • Good to know Bash does this with variables. I was confused at first, because I thought this worked as well. But I realized I was attempting to use a hardcoded string as the argument to `for`. Apparently Bash only does this splitting behavior when operating one variables. – sherrellbc Apr 30 '20 at 19:21
152

Probably the easiest and most secure way in BASH 3 and above is:

var="string    to  split"
read -ra arr <<<"$var"

(where arr is the array which takes the split parts of the string) or, if there might be newlines in the input and you want more than just the first line:

var="string    to  split"
read -ra arr -d '' <<<"$var"

(please note the space in -d ''; it cannot be omitted), but this might give you an unexpected newline from <<<"$var" (as this implicitly adds an LF at the end).

Example:

touch NOPE
var="* a  *"
read -ra arr <<<"$var"
for a in "${arr[@]}"; do echo "[$a]"; done

Outputs the expected

[*]
[a]
[*]

as this solution (in contrast to all previous solutions here) is not prone to unexpected and often uncontrollable shell globbing.

Also this gives you the full power of IFS as you probably want:

Example:

IFS=: read -ra arr < <(grep "^$USER:" /etc/passwd)
for a in "${arr[@]}"; do echo "[$a]"; done

Outputs something like:

[tino]
[x]
[1000]
[1000]
[Valentin Hilbig]
[/home/tino]
[/bin/bash]

As you can see, spaces can be preserved this way, too:

IFS=: read -ra arr <<<' split  :   this    '
for a in "${arr[@]}"; do echo "[$a]"; done

outputs

[ split  ]
[   this    ]

Please note that the handling of IFS in BASH is a subject on its own, so do your tests; some interesting topics on this:

  • unset IFS: Ignores runs of SPC, TAB, NL and on line starts and ends
  • IFS='': No field separation, just reads everything
  • IFS=' ': Runs of SPC (and SPC only)

Some last examples:

var=$'\n\nthis is\n\n\na test\n\n'
IFS=$'\n' read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[@]}"; do let i++; echo "$i [$a]"; done

outputs

1 [this is]
2 [a test]

while

unset IFS
var=$'\n\nthis is\n\n\na test\n\n'
read -ra arr -d '' <<<"$var"
i=0; for a in "${arr[@]}"; do let i++; echo "$i [$a]"; done

outputs

1 [this]
2 [is]
3 [a]
4 [test]

BTW:

  • If you are not used to $'ANSI-ESCAPED-STRING' get used to it; it's a timesaver.

  • If you do not include -r (like in read -a arr <<<"$var") then read does backslash escapes. This is left as exercise for the reader.


For the second question:

To test for something in a string I usually stick to case, as this can check for multiple cases at once (note: case only executes the first match, if you need fallthrough use multiple case statements), and this need is quite often the case (pun intended):

case "$var" in
'')                empty_var;;                # variable is empty
*' '*)             have_space "$var";;        # have SPC
*[[:space:]]*)     have_whitespace "$var";;   # have whitespaces like TAB
*[^-+.,A-Za-z0-9]*) have_nonalnum "$var";;    # non-alphanum-chars found
*[-+.,]*)          have_punctuation "$var";;  # some punctuation chars found
*)                 default_case "$var";;      # if all above does not match
esac

So you can set the return value to check for SPC like this:

case "$var" in (*' '*) true;; (*) false;; esac

Why case? Because it usually is a bit more readable than regex sequences, and thanks to Shell metacharacters it handles 99% of all needs very well.

cmaher
  • 5,100
  • 1
  • 22
  • 34
Tino
  • 9,583
  • 5
  • 55
  • 60
  • 8
    This answer deserves more upvotes, due to the globbing issues highlighted, and its comprehensiveness – Brian Agnew Mar 07 '16 at 12:04
  • @brian Thanks. Please note that you can use `set -f` or `set -o noglob` to switch of globbing, such that shell metacharacters no more do harm in this context. But I am not really a friend of that, as this leaves behind much power of the shell / is very error prone to switch back and forth this setting. – Tino Mar 14 '16 at 13:55
  • 3
    Wonderful answer, indeed deserves more upvotes. Side note on case's fall through - you can use `;&` achieve that. Not quite sure in which version of bash that appeared. I'm a 4.3 user – Sergiy Kolodyazhnyy Jan 11 '17 at 08:19
  • 4
    @Serg thanks for noting, as I did not know this yet! So I looked it up, it appeared in [Bash4](http://www.tldp.org/LDP/abs/html/bashver4.html). `;&` is the forced fallthrough without pattern check like in C. And there also is `;;&` which just continues to do the further pattern checks. So `;;` is like `if ..; then ..; else if ..` and `;;&` is like `if ..; then ..; fi; if ..`, where `;&` is like `m=false; if ..; then ..; m=:; fi; if $m || ..; then ..` -- one never stops learning (from others) ;) – Tino Jan 14 '17 at 08:40
  • @Tino That's absolutely true - learning is a continuous process. In fact, I didn't know of `;;&` before you commented :D Thanks, and may the shell be with you ;) – Sergiy Kolodyazhnyy Jan 14 '17 at 08:45
  • 2
    For folks less familiar with working with bash array variables, if you echo the array variable expecting to see the contents of the array you will only see the first element, so this might appear not to work properly. Use echo "${ARRAY[*]}" to see the contents. – Kvass Jan 09 '21 at 17:01
  • Furthermore, note that reading to EOF gives exit code 1 - this is expected. – Kvass Jan 09 '21 at 17:08
  • It's really unfortunate that there are multiple higher-voted answers that are just _wrong_. – dimo414 Dec 16 '21 at 21:16
  • This is the most comprehensive answer. – Cognitiaclaeves May 19 '22 at 17:32
101

Just use the shells "set" built-in. For example,

set $text

After that, individual words in $text will be in $1, $2, $3, etc. For robustness, one usually does

set -- junk $text
shift

to handle the case where $text is empty or start with a dash. For example:

text="This is          a              test"
set -- junk $text
shift
for word; do
  echo "[$word]"
done

This prints

[This]
[is]
[a]
[test]
PF4Public
  • 684
  • 6
  • 15
Idelic
  • 14,976
  • 5
  • 35
  • 40
  • 5
    This is an excellent way to split the var so that individual parts may be accessed directly. +1; solved my problem – Cheekysoft Jul 26 '11 at 11:28
  • I was going to suggest using `awk` but `set` is much easier. I'm now a `set` fanboy. Thanks @Idelic! – Yzmir Ramirez Aug 18 '12 at 01:47
  • 27
    Please be aware of shell globbing if you do such things: `touch NOPE; var='* a *'; set -- $var; for a; do echo "[$a]"; done` outputs `[NOPE] [a] [NOPE]` instead of the expected `[*] [a] [*]`. **Only use it if you are 101% sure that there are no SHELL metacharacters in the splitted string!** – Tino May 13 '15 at 10:03
  • 4
    @Tino: That issue applies everywhere, not only here, but in this case you could just `set -f` before `set -- $var` and `set +f` afterwards to disable globbing. – Idelic May 14 '15 at 05:11
  • 3
    @Idelic: Good catch. With `set -f` your solution is safe, too. But `set +f` is the default of each shell, so it is an essential detail, which must be noted, because others are probably not aware of it (as I was, too). – Tino May 14 '15 at 12:50
50
$ echo "This is   a sentence." | tr -s " " "\012"
This
is
a
sentence.

For checking for spaces, use grep:

$ echo "This is   a sentence." | grep " " > /dev/null
$ echo $?
0
$ echo "Thisisasentence." | grep " " > /dev/null     
$ echo $?
1
DVK
  • 126,886
  • 32
  • 213
  • 327
  • 1
    In BASH `echo "X" |` can usually be replaced by `<<<"X"`, like this: `grep -s " " <<<"This contains SPC"`. You can spot the difference if you do something like `echo X | read var` in contrast to `read var <<< X`. Only the latter imports variable `var` into the current shell, while to access it in the first variant you must group like this: `echo X | { read var; handle "$var"; }` – Tino May 13 '15 at 11:16
23
echo $WORDS | xargs -n1 echo

This outputs every word, you can process that list as you see fit afterwards.

Álex
  • 1,587
  • 11
  • 17
  • Elegant solution, I use this in CI for splitting environment variables with whitespaces. for example : `npm install $(echo $NPM_PACKAGES | xargs -n1 echo) --save-dev` – Steve Moretz Apr 30 '22 at 11:02
21

(A) To split a sentence into its words (space separated) you can simply use the default IFS by using

array=( $string )


Example running the following snippet

#!/bin/bash

sentence="this is the \"sentence\"   'you' want to split"
words=( $sentence )

len="${#words[@]}"
echo "words counted: $len"

printf "%s\n" "${words[@]}" ## print array

will output

words counted: 8
this
is
the
"sentence"
'you'
want
to
split

As you can see you can use single or double quotes too without any problem

Notes:
-- this is basically the same of mob's answer, but in this way you store the array for any further needing. If you only need a single loop, you can use his answer, which is one line shorter :)
-- please refer to this question for alternate methods to split a string based on delimiter.


(B) To check for a character in a string you can also use a regular expression match.
Example to check for the presence of a space character you can use:

regex='\s{1,}'
if [[ "$sentence" =~ $regex ]]
    then
        echo "Space here!";
fi
Community
  • 1
  • 1
Luca Borrione
  • 16,324
  • 8
  • 52
  • 66
  • For regex hint (B) a +1, but -1 for wrong solution (A) as this is error prone to shell globbing. ;) – Tino May 13 '15 at 10:53
9

$ echo foo bar baz | sed 's/ /\n/g'

foo
bar
baz
R B
  • 91
  • 1
  • 1
6

For checking spaces just with bash:

[[ "$str" = "${str% *}" ]] && echo "no spaces" || echo "has spaces"
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

For my use case, the best option was:

grep -oP '\w+' file

Basically this is a regular expression that matches contiguous non-whitespace characters. This means that any type and any amount of whitespace won't match. The -o parameter outputs each word matches on a different line.

hdante
  • 7,685
  • 3
  • 31
  • 36
1

Another take on this (using Perl):

$ echo foo bar baz | perl -nE 'say for split /\s/'
foo
bar
baz
Anthony
  • 1,877
  • 17
  • 21