58

I want to split a text with comma , not space in for foo in list. Suppose I have a CSV file CSV_File with following text inside it:

Hello,World,Questions,Answers,bash shell,script
...

I used following code to split it into several words:

for word in $(cat CSV_File | sed -n 1'p' | tr ',' '\n')
do echo $word
done

It prints:

Hello
World
Questions
Answers
bash
shell
script

But I want it to split the text by commas not spaces:

Hello
World
Questions
Answers
bash shell
script

How can I achieve this in bash?

Eng.Fouad
  • 115,165
  • 71
  • 313
  • 417

9 Answers9

62

Set IFS to ,:

sorin@sorin:~$ IFS=',' ;for i in `echo "Hello,World,Questions,Answers,bash shell,script"`; do echo $i; done
Hello
World
Questions
Answers
bash shell
script
sorin@sorin:~$ 
Sorin
  • 5,201
  • 2
  • 18
  • 45
  • Nice! I forgot all about the IFS env variable! – chown Oct 10 '11 at 20:56
  • To use this in a script you should restore the IFS variable to the previous value. See Andrew Newdigate's answer. – clime May 06 '16 at 09:12
  • @Sorin: By "To use this in a script" I mean that more code is expected than just this so you want to reset IFS to avoid any unexpected behavior. Implication of IFS seems to be quite extensive so better be lazy than unclear. Btw. if you run your command like tin your answer, it will change IFS for the the current environment and you can really easily forget that and then wonder why your shell behaves so weird. – clime May 06 '16 at 11:43
61

Using a subshell substitution to parse the words undoes all the work you are doing to put spaces together.

Try instead:

cat CSV_file | sed -n 1'p' | tr ',' '\n' | while read word; do
    echo $word
done

That also increases parallelism. Using a subshell as in your question forces the entire subshell process to finish before you can start iterating over the answers. Piping to a subshell (as in my answer) lets them work in parallel. This matters only if you have many lines in the file, of course.

mkj
  • 1,274
  • 11
  • 8
  • 1
    Yea, this *is* way better then what I had suggested. +1 for the l33t bash skills mkj :) – chown Oct 10 '11 at 20:49
  • 1
    Don't even need the while loop. – Martin York Aug 12 '14 at 23:18
  • 1
    Don't need the while loop as it stands, but I was understanding the invocation of `echo` as a proxy for some more interesting command; that is, that the OP wanted the multi-word CSV content in a shell variable to use with some other arbitrary command. That's why I used read to demonstrate how you get the content into a shell variable. – mkj Dec 18 '14 at 20:27
  • Note that this won't work as expected if the input contains newlines (it will then be split on the commas **and** the newlines originally appearing in the input, i.e. `a,b\nc,d` will be split in 4 fields instead of desired 3). With Bash I'd recommend using single-command-scoped `IFS` setting combined with `read -a`, or `read -d` (cf. [proper IFS setting in Bash](http://mywiki.wooledge.org/Arguments#Internal_Field_Separator_.28.60IFS.60.29)), but for POSIX shells I find [substring processing](https://stackoverflow.com/a/15988793) to be the only clean and fool-proof solution. – desseim Jan 28 '21 at 08:38
23

I think the canonical method is:

while IFS=, read field1 field2 field3 field4 field5 field6; do 
  do stuff
done < CSV.file

If you don't know or don't care about how many fields there are:

IFS=,
while read line; do
  # split into an array
  field=( $line )
  for word in "${field[@]}"; do echo "$word"; done

  # or use the positional parameters
  set -- $line
  for word in "$@"; do echo "$word"; done

done < CSV.file
David Moles
  • 48,006
  • 27
  • 136
  • 235
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Very handy to be able to refer to specific fields by name – HXCaine Nov 20 '12 at 01:35
  • @glenn-jackman You are correct, canonical UNIX would use your first method. The second one only works with a modern implementation of bash or zsh. – Dwight Spencer Jan 07 '14 at 01:13
  • 1
    bash's `read` command has a `-a` option to read the words in the line into an array: `while read -a words; do for word in "${words[@]}" ...` – glenn jackman Jan 07 '14 at 01:37
  • And at least in the version I am using, reading n fields when one record has an extra comma does not fall over, but puts in the last field the two values with the comma in the middle. – zsalya Sep 29 '22 at 18:00
  • yes, `IFS=, read a b <<<"1,2,3"` will set variable `b` to the string `2,3` – glenn jackman Sep 29 '22 at 18:24
12
kent$  echo "Hello,World,Questions,Answers,bash shell,script"|awk -F, '{for (i=1;i<=NF;i++)print $i}'
Hello
World
Questions
Answers
bash shell
script
Kent
  • 189,393
  • 32
  • 233
  • 301
  • 1
    I'm assuming that `echo $word` isn't actually the real thing that needs to be done with $word. In which case, your awk expession is another way to do the sed and tr in the original question. I think that Eng.Fouad wants the value, with the space, in a shell variable to do something else with. – mkj Oct 10 '11 at 20:53
  • @mkj This solution is ok for further usage as shell variable, e.g.: `FOO="Hello,World,Questions,Answers,bash shell,script"; BOO=$(echo $FOO | awk -F, '{for (i=1;i<=NF;i++)print $i}'); for B in $BOO; do echo "<$B>"; done` – Roman Chernyatchik Sep 27 '17 at 09:38
  • @RomanChernyatchik The loop over `$BOO` there yields separate variables for "bash" and "shell" and so wouldn't work as the OP intended – Peter Berg Feb 07 '18 at 14:30
9

Create a bash function

split_on_commas() {
  local IFS=,
  local WORD_LIST=($1)
  for word in "${WORD_LIST[@]}"; do
    echo "$word"
  done
}

split_on_commas "this,is a,list" | while read item; do
  # Custom logic goes here
  echo Item: ${item}
done

... this generates the following output:

Item: this
Item: is a
Item: list

(Note, this answer has been updated according to some feedback)

Andrew Newdigate
  • 6,005
  • 3
  • 37
  • 31
  • Weird. Any idea *why* that's happening? – Andrew Newdigate Jul 21 '14 at 14:46
  • The side effects are explained here http://superuser.com/questions/781766/ifs-separated-items-in-loop – Val Jul 21 '14 at 19:57
  • 1
    To avoid the "side effects", first store IFS var somewhere `OLDIFS=$IFS`, then execute `IFS=, sentences1=($sentences)` and finally restore IFS: `IFS=$OLDIFS`. Otherwise, this is the answer I was searching for. Thanks. – clime May 06 '16 at 09:07
  • @clime and Val, I've updated my answer to take your feedback into account. It seems to work well, but let me know what you think. – Andrew Newdigate May 06 '16 at 12:48
  • I think that your post is too complicated now. It was enough to fix the original code snippet and make a give credits to commentators by a small note in the end ;). But anyway, nothing is perfect. – clime May 06 '16 at 15:02
5

Read: http://linuxmanpages.com/man1/sh.1.php & http://www.gnu.org/s/hello/manual/autoconf/Special-Shell-Variables.html

IFS The Internal Field Separator that is used for word splitting after expansion and to split lines into words with the read builtin command. The default value is ``''.

IFS is a shell environment variable so it will remain unchanged within the context of your Shell script but not otherwise, unless you EXPORT it. ALSO BE AWARE, that IFS will not likely be inherited from your Environment at all: see this gnu post for the reasons and more info on IFS.

You're code written like this:

IFS=","
for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;

should work, I tested it on command line.

sh-3.2#IFS=","
sh-3.2#for word in $(cat tmptest | sed -n 1'p' | tr ',' '\n'); do echo $word; done;
World
Questions
Answers
bash shell
script
Ashley Raiteri
  • 700
  • 8
  • 17
1

You can use:

cat f.csv | sed 's/,/ /g' |  awk '{print $1 " / " $4}'

or

echo "Hello,World,Questions,Answers,bash shell,script" | sed 's/,/ /g' |  awk '{print $1 " / " $4}'

This is the part that replace comma with space

sed 's/,/ /g'
ozma
  • 1,633
  • 1
  • 20
  • 28
0

For me, use array split is simpler ref

IN="bla@some.com;john@home.com"
arrIN=(${IN//;/ })
echo ${arrIN[1]}  
Nam G VU
  • 33,193
  • 69
  • 233
  • 372
  • But this one `...,bash shell,...` will also be splinted, which is OP wanted to to avoid. – Ivan Aug 31 '22 at 07:26
0

Using readarray(mapfile):

$ cat csf
Hello,World,Questions,Answers,bash shell,script

$ readarray -td, arr < csf

$ printf '%s\n' "${arr[@]}"
Hello
World
Questions
Answers
bash shell
script
Ivan
  • 6,188
  • 1
  • 16
  • 23