10

I tried to find out the unprintable characters in data filein unix. Code :

#!/bin/ksh
export SRCFILE='/data/temp1.dat'
while read line 
do
len=lenght($line)
for( $i = 0; $i < $len; $i++ ) {

        if( ord(substr($line, $i, 1)) > 127 )
        {
            print "$line\n";
            last;
        }
done < $SRCFILE

The code is not working , please help me in getting a solution for the above query.

IMSoP
  • 89,526
  • 13
  • 117
  • 169
user3759763
  • 101
  • 1
  • 1
  • 5
  • 2
    Possible duplicate of [How do I grep for all non-ASCII characters in UNIX](https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters-in-unix) – kenorb Apr 12 '18 at 21:55
  • Also dup of: [find and delete files with non-ascii names](https://stackoverflow.com/q/19146240/55075). – kenorb Apr 12 '18 at 22:53

3 Answers3

16

You can use grep for finding non-printable characters in a file, something like the following, which finds all non-printable-ASCII and all non-ASCII:

grep -P -n "[\x00-\x1F\x7F-\xFF]" input_file

-P gives you the more powerful Perl regular expressions (PCREs) and -n shows line numbers.

If your grep doesn't support PCREs, I'd just use Perl for this directly:

perl -ne '$x++;if($_=~/[\x00-\x1F\x7F-\xFF]/){print"$x:$_"}' input_file
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
11

You may try something like this :

grep '[^[:print:]]' filePath
blackSmith
  • 3,054
  • 1
  • 20
  • 37
  • can you provide some part of file as example. Again you want to show the lines containing those character or ? – blackSmith Sep 08 '14 at 13:22
  • HI Smith, Please find the below record – user3759763 Sep 08 '14 at 13:23
  • ABC|111112 | ATTEMPTED | INDIA | AUSTRALIA | ENGLAND ABC|222222 | ATTEMPTED ^Z | INDIA | AUSTRALIA | ENGLAND ABC|333333 | ATTEMPTED | INDIA | AUSTRALIA | ENGLADN ABC|444444 | ROMATIC ^Z | INDIA | AUSTRALIA | ENGLADN – user3759763 Sep 08 '14 at 13:26
  • In the above record i need to populate the record # 2 and 4 as they contains unprintable characters and records 1 and 3 look fine . – user3759763 Sep 08 '14 at 13:27
  • 3
    You need to use `[^[:print:][:blank:]]` if you don't wish to include spaces/tabs. – rveach Sep 29 '17 at 13:29
-3

This sounds pretty trite but I was not sure how to do it just now. I have become fond of "od" depending on what you are doing you may want something suited to printing arbitrary characters. The awk code is not very elegant but it is flexible if you are looking for specifics, the point is just to show the use of od however. Note the problems with awk compares and the spaces etc,

cat filename | od -A n -t x1z | awk '{ p=0; i=1; if ( NF>16) { while (i<17) {if ( $i!="0d"){ if ( $i!="0a") {if ( $i" " < "20 " ) {print $i ; p=1;}  if ( $i" "> "7f "){print $i;   p=1;}}}  i=i+1} if (p==1) print $0; }}' | more