19

How do I convert the MS Word quotes and apostrophes to regular quotes and apostrophes characters in Java? What's the unicode number for these characters?

“how are you doing?”

‘howdy’

Since Stack Overflow autofixes them, here's how they appear in an editor

Curly Quotes

to

"how are you doing?"

'howdy'

Community
  • 1
  • 1
user340188
  • 191
  • 1
  • 1
  • 4
  • Not converted here, the ‘smart quotes’ are fine. StackOverflow converts straight quotes to “” in question titles (controversially), but leaves question text alone. – bobince May 13 '10 at 11:30
  • 1
    Incidentally, is there really a good reason to replace them with straight quotes? They're not special “MS Word” characters, they're perfectly valid Unicode characters which should normally by handled fine by any application that can handle Unicode. – bobince May 13 '10 at 11:31
  • @bobince not all applications handle unicode properly, but often people paste text that they wrote in ms-word and it's nice to be able to preserve the quotes (by converting them to regular quotes) than to strip them out completely. – dan Jul 21 '10 at 15:13

3 Answers3

21

Going off Thomas's answer, the code is:

return text.replaceAll("[\\u2018\\u2019]", "'")
           .replaceAll("[\\u201C\\u201D]", "\"");
dimo414
  • 47,227
  • 18
  • 148
  • 244
13

Here's a very useful link for everyone dealing with Unicode: Unicode codepoint lookup/search tool.

Searching for "quotation mark" gives

‘ (U+2018) LEFT SINGLE QUOTATION MARK
’ (U+2019) RIGHT SINGLE QUOTATION MARK
“ (U+201C) LEFT DOUBLE QUOTATION MARK
” (U+201D) RIGHT DOUBLE QUOTATION MARK

There are several other quote-like symbols that you might consider replacing.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Thomas
  • 174,939
  • 50
  • 355
  • 478
  • The Unicode lookup link given in the answer is a third-party site. A better, "official" link is at unicode.org: https://util.unicode.org/UnicodeJsps/properties.html – Andrew P. Apr 25 '23 at 00:17
11

Thank to Nick van Esch at C# How to replace Microsoft's Smart Quotes with straight quotation marks?

Here is the code ('\u2019' is ’ in MS Word), it's useful because it covers problematic word characters.

if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
Community
  • 1
  • 1
123iamking
  • 2,387
  • 4
  • 36
  • 56
  • In the above answer, we are mentioning all MS Word quotes. Is there not any simple code which replaces all MS Word Quotes with Straight quotation marks? I mean, how can we list all the MS Word quotes? – Anish Mittal Jul 28 '17 at 06:31
  • @Anish Mittal: As far as I know, this is the simplest way. – 123iamking Dec 07 '17 at 02:25