1

I have a file 500MB of size. It has some non-ascii characters in it. I just want to find out those characters using Unix command. May it will be better to get the line numbers and position at each line.

Thanks :)

Mohamed Saligh
  • 12,029
  • 19
  • 65
  • 84
  • You might find an answer here at http://stackoverflow.com/questions/3001177/how-do-i-grep-for-non-ascii-characters-in-unix – vpit3833 Dec 07 '10 at 05:47
  • @vpit3833: am not very familiar with unix commands, I think that link does not provides the line numbers of those non-ascii chars. am sorry if am wrong... – Mohamed Saligh Dec 07 '10 at 05:50

2 Answers2

3

Use the answer given in the other solution, but add -n to grep.

Community
  • 1
  • 1
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

You know, it's weird. Sometimes I find it faster to code up some quick and dirty C than it is to try and navigate the wilderness of UNIX utility command line options :-)

#include <stdio.h>

int main (void) {
    size_t ln = 1;
    size_t chpos = 0;
    int chr;
    while ((chr = fgetc (stdin)) != EOF) {
        if (chr == '\n') {
            ln++;
            chpos = 0;
            continue;
        }
        chpos++;
        if (chr > 127) {
            printf ("Non-ASCII %02x found at line %d, offset %d\n",
                chr, ln, chpos);
        }
    }
    return 0;
}

This will give you both the line number, and the character position within that line, of any characters outside the ASCII range.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953