2

I have a function that I have used a bunch of times in various files which has a signature like:

Translate("English Message", "Spanish Message", "French Message")

and I am wanting to pull out the English, Spanish and French messages and then output them into a csv so that people who actually know these languages can tell me what I SHOULD have put in there.

Anyway, what I am running into is that some French and Spanish messages don't show up because of the accented characters and single quotes.

This is a vb.net program.

Edit

There was no problem with the language, my issue was actually the regular expression and my complete lack of understanding regular expressions.

Anthony Potts
  • 8,842
  • 8
  • 41
  • 56
  • 3
    What (programming) languages? – kennytm Feb 23 '10 at 13:43
  • 1
    While you're at it, you should make the program *read* from these CSV files (or use a standard localization/globalization/whatever solution). Keeping translations in code is a very, very bad idea. – Instance Hunter Feb 23 '10 at 13:46
  • -1 for not specifying the environment/language in use. If you're not aware of UTF-8 and Unicode by now then it's time to learn. – PP. Feb 23 '10 at 13:48
  • @Daniel I am definitely moving these off to a file, but my thought was that I can make the translations in code and then change the method out for another that accessed a file instead. – Anthony Potts Feb 23 '10 at 13:52
  • @PP Thanks, it's good to see that people are understanding of people who NEVER deal with something and then ask questions. – Anthony Potts Feb 23 '10 at 13:53

2 Answers2

1

Depends on the regex library you are using. Sane regex implementations use UTF-8 and have no such problems, but more details would be helpful about what lang you are using, what regex library etc.

anselm
  • 792
  • 1
  • 8
  • 21
1

If there is a DOTALL flag in your language's regex implementation, you might want to set it.

Alternatively, change the regex to capture a negated character class instead, like so:

([^your_delimiter]*?)

with your_delimiter being the character(s) immediately succeeding the string that you want to capture.

See this for further discussion:

http://en.wikipedia.org/wiki/Regular_expression#Unicode

nikola
  • 2,241
  • 4
  • 30
  • 42