0

I have a csv file. The file has some anomalies as it contains some unknown characters.

The characters appear at line 1535 in popular editors (images attached below). The sed command in the terminal for this linedoes not show anything.

$ sed '1535!d' sample.csv
"sample_id","sample_column_text_1","sample_"sample_id","sample_column_text_1","sample_column_text_2","sample_column_text_3"

However below are the snapshots of the file in various editors.

Sublime Text enter image description here

Nano enter image description here

Vi enter image description here

The directory has various csv files that contain this character/chain of characters.

I need to write a bash script to determine the files that have such characters. How can I achieve this?

nishant
  • 896
  • 1
  • 8
  • 27

2 Answers2

1

You can try tr :

grep '\000' filename to find if the files contain the \000 characters.

You can use this to remove NULL and make it non-NULL file : tr < file-with-nulls -d '\000' > file-without-nulls

Nikhil Fadnis
  • 787
  • 5
  • 14
1

The following is from;

http://www.linuxquestions.org/questions/programming-9/how-to-check-for-null-characters-in-file-509377/

#!/usr/bin/perl -w

use strict;

my $null_found = 0;

foreach my $file (@ARGV) {
    if ( ! open(F, "<$file") ) {
        warn "couldn't open $file for reading: $!\n";
        next;
    }

    while(<F>) {
        if ( /\000/ ) {
            print "detected NULL at line $. in file $file\n";
            $null_found = 1;
            last;
        }
    }
    close(F);
}

exit $null_found;

If it works as desired, you can save it to a file, nullcheck.pl and make it executable;

chmod +x nullcheck.pl

It seems to take an array of files names as input, but will fail if it finds in any, so I'd only pass in one at a time. The command below is used to run the script.

for f in $(find . -type f -exec grep -Iq . {} \; -and -print) ; do perl ./nullcheck.pl $f || echo "$f has nulls"; done

The above find command is lifted from Linux command: How to 'find' only text files?

nishant
  • 896
  • 1
  • 8
  • 27
Calvin Taylor
  • 664
  • 4
  • 15