Output all column numbers for a particular character

Question

I have a matrix(about 10,000x10,000), and I want to find the column number that contains '0'.

Matrix (test.txt) :

1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 0 1 1 1 1
1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
3 2 2 3 3 0 3 2 2 2
3 2 1 3 3 0 3 2 2 0
3 2 2 3 3 2 3 2 2 2
1 1 1 1 1 1 1 1 1 1

Output (example) :

2 4 6 10

I am new to LINUX SHELL, and have not found much in similar examples. Any help would be much appreciated!!

I just know how to find the row number using code: grep -nw '0' test.txt|cut -f1 -d':', Maybe I can transpose the matrix first(like this)? And then use the code above, right? Is there an easier way to do it?

Ed Morton · Accepted Answer · 2022-12-23T17:03:26.137

Using any awk in any shell on every Unix box:

$ awk '
    /(^| )0( |$)/ {
        for ( i=1; i<=NF; i++ ) {
            if ( $i == 0 ) {
                cols[i]
            }
        }
    }
    END {
        for ( i in cols ) {
            printf "%s%d", sep, i
            sep = OFS
        }
        print ""
    }
' file
2 4 6 10

The output above is not guaranteed to be in numeric (or any other) order due to the loop using the in operator, see https://www.gnu.org/software/gawk/manual/gawk.html#Scanning-an-Array for details.

If you need the field numbers printed in increasing numeric order then change the script to the very slightly slower:

awk '
    /(^| )0( |$)/ {
        for ( i=1; i<=NF; i++ ) {
            if ( $i == 0 ) {
                cols[i]
            }
        }
    }
    END {
        for ( i=1; i<=NF; i++ ) {
            if ( i in cols ) {
                printf "%s%d", sep, i
                sep = OFS
            }
        }
        print ""
    }
' file

You can then sort the output of @EdMorton's awk command by piping it as follows: | tr " " "\n" | sort -n | tr "\n" " " — Brandon E Taylor, Dec 23 '22 at 02:04
@BrandonETaylor `tr "\n" " "` would remove the final `\n` thereby turning the string into something that's no longer a valid POSIX text file and so YMMV with what any subsequent text processing tool can do with it. If the output needs to be sorted that's easily handled inside the script. I added a version that orders the output. — Ed Morton, Dec 23 '22 at 16:16

score 0 · Answer 2 · answered Dec 23 '22 at 08:18

Why not use a matrix language for matrix operations, e.g. GNU Octave:

<infile octave --silent --eval "
[row, col] = find( dlmread(0) == 0 );
dlmwrite(1, unique(col))"

Output:

The 0 and 1 given to the dlm* commands refer to stdandard-in and standard-out, respectively.

If you want the output on one line, transpose and specify a delimiter, e.g. change dlmwrite(...) to dlmwrite(1, unique(col)', ' ')"

score 0 · Answer 3 · answered Dec 23 '22 at 12:27

Maybe I can transpose the matrix

Yes, just use tool which can do it, e.g. GNU datamash as follows, let file.txt content be

1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 0 1 1 1 1
1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
3 2 2 3 3 0 3 2 2 2
3 2 1 3 3 0 3 2 2 0
3 2 2 3 3 2 3 2 2 2
1 1 1 1 1 1 1 1 1 1

then

datamash --field-separator=' ' transpose < file.txt

gives output

1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 0 1 1 1 1
1 1 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
3 2 2 3 3 0 3 2 2 2
3 2 1 3 3 0 3 2 2 0
3 2 2 3 3 2 3 2 2 2
1 1 1 1 1 1 1 1 1 1

Explanation: I inform GNU datamash that file is space-separated and instruct it to transpose. Disclaimer: this solution assumes that each line has exactly equal number of fields.

(tested in GNU datamash 1.7)

score 0 · Answer 4 · answered Dec 25 '22 at 08:48

one approach would be to pre-scan for existence of any "0" that isn't paired with other digits,

then simplify the columns to something resembling an ASCII bit-string, setting all non-zero columns to a "1" using high-speed gsub(), before splitting it with a new delim of FS = "0" :

1101011111 —> NF = 3
  ^ ^

         11
    -[0]-1
    -[0]-11111

instead of having to loop 10,000 columns every row, simply track the sum and difference of the string lengths for # of 1's, within each new column —- e.g. (2,1,5) in this example —- thus inferring the zeros located at $3 and $5

Output all column numbers for a particular character

4 Answers4