Bash - issue with deleting rows containing null values

Question

I have a .csv file which needs to be modified in the following way: for each column in the file, check if that column contains any null entries. If it does, it gets removed from the file. Otherwise, that column stays. I attempted to solve this problem using the following script:

cp file-original.csv file-tmp.csv

for (( i=1;i<=65;i++)); do
  for var in $(cut -d, -f$i file-tmp.csv); do
    if [ -n $var ]; then
      continue
    else
      cut -d, --complement -f$i file-tmp.csv > file-tmp.csv
      break
    fi
  done
done

I'm assuming that the issue lies in saving the result of each iteration to a file which is also being iterated over (file-tmp.csv). However, I'm not sure on how to circumvent this.

You have to use a temporary file, see for example [here](https://stackoverflow.com/q/3979548/3266847). I'll add an answer with an alternative to your approach that avoids copying in the first place. — Benjamin W., Jan 31 '20 at 23:09
See [Bash FAQ 001](https://mywiki.wooledge.org/BashFAQ/001). For CSV files that allows quoted fields, though, you should use a general-purpose language with a proper CSV-parsing library. — chepner, Feb 01 '20 at 01:49

Benjamin W. · Answer 1 · 2020-01-31T23:46:56.393

You have to use a temp file as in

cut -d, --complement -f$i file-tmp.csv > tmp.csv && mv tmp.csv file-tmp.csv

for var in $(cut -d, -f$i file-tmp.csv) is buggy: you won't be able to detect an empty line like this, because word splitting will just skip over it.

You could avoid all the file copies in the first place by keeping track of the columns you want to drop, and then drop them all in one go:

for i in {1..65}; do
    if grep -q '^$' <(cut -d, -f "$i" file-original.csv); then
        drop+=("$i")
    fi
done

cut -d, --complement -f "$(IFS=,; echo "${drop[*]}")" file-original.csv \
    > file-tmp.csv

This uses grep to see if a column contains an empty line, avoiding the slow loop and the word splitting bug.

After the for loop, the drop array contains all the column numbers we want to drop, and $(IFS=,; echo "${drop[*]}") prints them as a comma separated list.

score 0 · Answer 2 · answered Feb 01 '20 at 01:29

$ cat foo.csv
a,,c,d
a,b,,d

$ cat tst.awk
BEGIN { FS=OFS="," }
NR==FNR {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        if ($inFldNr ~ /^$/) {
            skip[inFldNr]
        }
    }
    next
}
FNR==1 {
    for (inFldNr=1; inFldNr<=NF; inFldNr++) {
        if ( !(inFldNr in skip) ) {
            out2in[++numOutFlds] = inFldNr
        }
    }
}
{
    for (outFldNr=1; outFldNr<=numOutFlds; outFldNr++) {
        inFldNr = out2in[outFldNr]
        printf "%s%s", $inFldNr, (outFldNr<numOutFlds ? OFS : ORS)
    }
}

$ awk -f tst.awk foo.csv foo.csv
a,d
a,d

score 0 · Answer 3 · answered Feb 01 '20 at 07:00

Looking at your question, I found a very simple answer, using only grep command and output to a temporary file. Assume your CSV file is called test.csv. The following creates a file test1.csv which has eliminated all of the lines containing null value :

grep -v null test.csv > test1.csv

-v option inverts the output of grep command, echoing lines that do not contain null within. The output can be forwarded to another file and then you can replace the original test.csv file.

Bash - issue with deleting rows containing null values

3 Answers3