0

I have some XML files with invalid characters in them, since there are a lot of them I'd like to use grep to search for them, but am not getting the correct results.

Opening the file in VIM shows something similar to this:

<email><202a>someone@address.com</email>

I'd like to search for the <202a>

I've tried the following:

grep -P "<202a>" file
grep -P "\<202a\>" file
grep -P "\x202a" file
grep -P "\x202A" file

Note that the <202a> is not a string...when printed to the console (i.e. if I just grep for email) it shows as enter image description here

JamesE
  • 3,833
  • 9
  • 44
  • 82

1 Answers1

1

This should do it

tr -cd '[:cntrl:][:print:]' < file

Depending on locale you might need to

LANG= tr -cd '[:cntrl:][:print:]' < file

Or this

tr -cd $'\x01-\x7e' < file

Cygwin and tr settings

Community
  • 1
  • 1
Zombo
  • 1
  • 62
  • 391
  • 407
  • This worked, but I was looking more for a solution using grep - this post did the trick. http://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters-in-unix – JamesE May 22 '14 at 15:30