Separating rows in a text file based on the column

Question

Given a text file below, I want to separate the rows whose the value in second column is zero and put those rows in a separate file. Since the values in the second column are starting from 0 to 83, I would like to have this approach for every value. I have written the code below but it is not working as it should be and every output file generated is empty. Can anyone tell me what am I doing wrong?

for i in {0..83}; do awk ' $2=="$i" {print}' combined-all.txt > combined-all-$i.txt; done

here is part of the text file

Subj02 19 000274 000318
Subj01 83 000319 000362
Subj03 18 000363 000414
Subj04 83 000415 000447
Subj05 17 000448 000490
Subj06 0  000491 000540
...

Could you clarify how many files you want? Why are you iterating over 0..83? — Behe, Dec 10 '19 at 20:42
Update your question to show the expected output given your posted sample input. Do you expect 84 different output files, some of them empty, or only as many output files as there are unique values in the second column of your input file? — Ed Morton, Dec 10 '19 at 23:31

score 1 · Answer 1 · answered Dec 10 '19 at 22:56

1

Or you can use awk var assignment

for i in {0..83}; do awk -v i=$i '$2==i' combined-all.txt > combined-all-$i.txt; done

answered Dec 10 '19 at 22:56

Diego Torres Milano

65,697
9
111
134

Walter A · Answer 2 · 2019-12-11T09:15:51.023

1

awk loops through files, try to use awk without a loop.

awk '{print >> "combined-all-" $2 ".txt"}' combined-all.txt

EDIT: Inputfile is combined-all.txt, not combined-all-$i.txt

edited Dec 11 '19 at 09:15

answered Dec 10 '19 at 23:04

Walter A

19,067
2
23
43

1

Can't imagine why this got downvoted, it's the closest so far to what is probably the right answer. It has a couple of issues but upvoting to compensate. – Ed Morton Dec 10 '19 at 23:37
@EdMorton perhaps because the numbered files are the ones created? – Diego Torres Milano Dec 11 '19 at 01:29
@DiegoTorresMilano You were right, I did not use the correct inputfile. I edited my answer. – Walter A Dec 11 '19 at 09:16
@DiegoTorresMilano ah I hadn't noticed that typo, thanks for pointing it out. Seems like whoever downvoted could've just left a comment if that was all it was! – Ed Morton Dec 11 '19 at 17:22
@WalterA very nice solution, however to be exactly the same as the OP you should filter `$2` in `0..83` range – Diego Torres Milano Dec 11 '19 at 18:22
@DiegoTorresMilano OP wrote `Since the values in the second column are starting from 0 to 83`, so I thought the input was filtered already. There are always more things to check, like an non-existing inputfile (like `combined-all-$i.txt`), slashes in field 2 (difficult for a filename), a space in field 1 (should I use `$NF-2`?). To be exactly the same I should overwrite existing files and look at missing numbers in the range 0..83. I might start with `rm -f combined-all-{0..83}.txt 2>/dev/null; touch combined-all-{0..83}.txt`. – Walter A Dec 11 '19 at 18:38

score 0 · Answer 3 · answered Dec 10 '19 at 20:44

0

... not using awk a lot these days... but this works:

for i in {0..83}; do awk -F" " '{ if ($2=='"$i"') {print}}' combined-all.txt > combined-all-$i.txt; done

Note the '"$1"'

answered Dec 10 '19 at 20:44

hootnot

1,005
8
13

Yeah, don't do that, see [how-do-i-use-shell-variables-in-an-awk-script](https://stackoverflow.com/questions/19075671/how-do-i-use-shell-variables-in-an-awk-script). There are several other awk and shell issues with that line of code too. – Ed Morton Dec 10 '19 at 23:40
my assumption here is that it is a quick hack to get the result split. Iterating over the input 84 times isn't that efficient also. But if not a lot of data, it is no problem. – hootnot Dec 11 '19 at 09:28
It's simpler, clearer, and briefer to do it the right way (robust, efficient, etc.) than to do it the wrong way though so there's just no reason to do this. – Ed Morton Dec 11 '19 at 17:25

Separating rows in a text file based on the column

3 Answers3