Loop over tuples in bash, separated by a newline

Question

I have a file that looks like this:

14757,file_one
14756,file_two
14755,file_three

I want to loop over each line and refer to tuple components by separate variables. For example, when iterating over the first line, $1 would be 14757 and $2 would have a value of file_one.

I try to achieve this with:

for i in $(cat files.txt); do IFS=","; set -- $i; echo $1 and $2; done

However, it loops over each word, and the result is not what I expect:

14757 and
file_one
14756 and
file_two
14755 and
file_three and

This is what I want:

14757 and file_one
14756 and file_two
14755 and file_three

I tried to adapt solutions posted in question Loop over tuples in bash? without success.

Please show us what you've tried, and explain how it didn't work (Were there errors? Incorrect output? If so, what did it look like?). — larsks, Apr 26 '23 at 12:57
"I tried to adapt solutions posted in question Loop over tuples in bash? without success." What code did you end up with when you tried to adapt the solutions? What happened when you tried those versions of the code, and how is that different from what you wanted? — Karl Knechtel, Apr 27 '23 at 18:21
Is the important thing here the format of the output or the variable assignment? In other words, are you trying to convert commas to ` and ` or is the important thing to loop over the pairs and assign them to variables so that you can do something more complex with them than just print them with the word "and" in between? — Stephen Ostermiller, Apr 27 '23 at 20:03
@StephenOstermiller It's the latter case. I wanted to loop over pairs and assign each value to a separate variable so that I could use them in a URL, which I would then use in a `curl` command. — Karolis, Apr 27 '23 at 20:23
You should [edit] your question to show the URLs that you actually want as your output rather than fake output like `14757 and file_one` — Stephen Ostermiller, Apr 28 '23 at 08:02
@StephenOstermiller "You should edit your question to show the URLs that you actually want as your output rather than fake output like 14757 and file_one." Would that really help? When asking the question I always try to distill the problem to its most obvious form. I mean, I strip the details that I do not think are necessary and if I want to transform input X to output Y, does it really matter what I do with Y? For example, the problem is just a small part of my CI / CD script. Anyway, the question is resolved so I guess there is no necessity to drag this any longer. — Karolis, Apr 28 '23 at 15:56
You are getting answers that just replace the `,` with the word "and." That probably isn't helpful and in this case I think you simplified a bit too much. At the same time, you also made it more complicated by introducing the requirement for variables. Variables aren't really needed here, they are just a means to an end, and probably not the best way to get the results you want. — Stephen Ostermiller, Apr 28 '23 at 18:28
This is the subject of [a meta question](https://meta.stackoverflow.com/questions/424404/why-does-this-question-lack-clarity). — Peter Mortensen, May 02 '23 at 17:45

Paul Hodges · Accepted Answer · 2023-04-26T13:47:03.133

If that's what you feel you need, then

$: while IFS=$'",\n' read -a line; do set -- "${line[@]}"; shift; echo $1 and $2; done <tmp
14757 and file_one
14756 and file_two
14755 and file_three

I used the quotes as delimiters as well as the comma, which creates a leading empty field in cell 0, so I shift it off.

...but unless there is a compelling reason, just use the array.

$: while IFS=$'",\n' read -a fields; do echo "${fields[1]} and ${fields[2]}"; done <tmp
14757 and file_one
14756 and file_two
14755 and file_three

awk would be a lot more efficient, and notably faster if the result set is very big -

$: awk -F'[",]' '{print $2" and "$3}' tmp
14757 and file_one
14756 and file_two
14755 and file_three

or even sed -

$: sed 's/^"//; s/"$//; s/,/ and /;' tmp
14757 and file_one
14756 and file_two
14755 and file_three

This one is a little more direct and mechanical, but if you read regexes it's pretty easy to understand: trim the leading quote, trim the trailing quote, convert the comma. I could have used s/"//g, but I suspect the two anchored substitutions are faster than scanning the whole string since I know where the quotes are. It likely doesn't matter here, but worth mentioning for when you're processing a multi-GB file and you want to shave a little time.

If you actually did pipe your data through a tr and remove the quotes, then all these are a little simpler, as they don't have to deal with that anymore, and you don't ignore the first empty field.

$: while IFS=, read -a line; do set -- "${line[@]}"; echo $1 and $2; done <tmp
14757 and file_one
14756 and file_two
14755 and file_three

$: while IFS=, read -a fields; do echo "${fields[0]} and ${fields[1]}"; done <tmp
14757 and file_one
14756 and file_two
14755 and file_three

$: awk -F, '{print $1 " and " $2}' tmp
14757 and file_one
14756 and file_two
14755 and file_three

$: sed 's/,/ and /;' tmp
14757 and file_one
14756 and file_two
14755 and file_three

score 4 · Answer 2 · answered Apr 26 '23 at 14:04

4

A variation on the while/read loop:

$ while IFS=, read -r arg1 arg2; do echo "${arg1} and ${arg2}"; done < files.txt
14757 and file_one
14756 and file_two
14755 and file_three

answered Apr 26 '23 at 14:04

markp-fuso

28,790
4
16
36

Dominique · Answer 3 · 2023-04-26T14:21:30.927

0

The naming of your variables, $1 and $2, is a give-away. You mean you have heard somewhere of a technology, called awk, but you have forgotten its name :-)

Let me show you an example:

awk -F " " '{print $1 " blabla " $2}' file.txt

Result:

14757 blabla file_one
14756 blabla file_two
14755 blabla file_three

For your information: the flag -F " " means that I use a space as a separator.

In case you have problems with double quotes in your output, you can simply remove them adding | tr -d "\"" at the end of your command, so you get:

awk -F " " '{print $1 " blabla " $2}' file.txt | tr -d "\""

edited Apr 26 '23 at 14:21

answered Apr 26 '23 at 13:12

Dominique

16,450
15
56
112

OP changed the input file format, and this still needs to deal with the leading and trailing quotes. – Paul Hodges Apr 26 '23 at 13:29
2

For the record, downvotes were not me. You should probably still get rid of the [UUoC](https://porkmail.org/era/unix/award#cat), and OP changed the delimiter to a comma. Also, OP has some significant lack of clarity over whether they actually stripped the quotes when they *created* the file... (And `$1`/`$2` is basic `bash`, `perl`, and `PHP` among others, as well as `awk`, though `awk` is likely a simpler solution here.) – Paul Hodges Apr 26 '23 at 14:00
1

@PaulHodges: I adapted my answer accordingly (taking care of "UUoC") – Dominique Apr 26 '23 at 14:22

Loop over tuples in bash, separated by a newline

3 Answers3