0

I am working on importing a sql dump into a mysql database using workbench. The data set includes some extended ascii/unicode characters like "ç" in "Français" in some of the insert statements. These charecters break the import.

I do not care about those characters so using notepadd++ and this page Notepad++, How to remove all non ascii characters with regex? I am trying to strip out all the extended characters using this regex [^\x00-\x7F]+ which per my poor understanding is basically NOT 00-7f or NUL(0) through DEL(127).

It finds the right characters, but for some reason also finds the CRLF at the end of each line - which is not in this range and I am not sure why as CR and LF are \x0A and \x0D they should not be in that set.

I am sure I am missing something simple - so is there a better regex to use to not lose my newlines, or even a way to tell SQL workbench to ignore the extended characters?

Here is an example of one of the insert lines with an extended value in it:

INSERT INTO as_catalog VALUES('525234','Google Apps Sync™ for Microsoft Outlook® 3.3.355.950','0');

Thanks!

  • I could not reproduce this behaviour. Make sure you have installed a recent version of Notepad++. Also, try adding a newline manually (just for testing) and see if this regex matches that newline as well. If not, then surely you have some weird newline characters in your text. You would need to normalise those first then. – trincot Sep 06 '17 at 22:56
  • 1
    Might be easier to accept such characters? – Rick James Sep 07 '17 at 00:33
  • I was able to get it to work. I was up to date, but reinstalled, and it magically started behaving. Go figure. – Brad Robbins Sep 08 '17 at 00:16

0 Answers0