15

I am looking for a way to replace the quotes with “corrected” quotations marks in an user input.

The idea

Here is a snippet briefly showing the principle:
For quotes, the “correct” ones have an opening and a closing , so it needs to be replaced in the good way.

$('#myInput').on("keyup", function(e) {
  // The below doesn't work when there's no space before or after.
  this.value = this.value.replace(/ "/g, ' “');
  this.value = this.value.replace(/" /g, '” ');
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="myInput"></textarea>

But the above is not working in all cases.
For example, when the "quoted word" is at the very beginning or the very end of a sentence or a line.

Examples

Possible inputs (beware, french inside! :)) :
⋅ I'm "happy" ! Ça y est, j'ai "osé", et mon "âme sœur" était au rendez-vous…
⋅ The sign says: "Some text "some text" some text." and "Note the space here !"
⋅ "Inc"or"rect" quo"tes should " not be replaced.
⋅ I said: "If it works on 'singles' too, I'd love it even more!"

Correct outputs:
⋅ I'm “happy” ! Ça y est, j'ai “osé”, et mon “âme sœur” était au rendez-vous…
⋅ The sign says: “Some text “some text” some text.” and “Note the space here !”
⋅ “Inc"or"rect” quo"tes should " not be replaced.
⋅ I said: “If it works on ‘singles’ too, I'd love it even more!”

Incorrect outputs:
⋅ The sign says: “Some text ”some text“ some text.” and […]
Why it is incorrect:
→ There should be no space between the end of a quotation and its closing mark.
→ There should be a space between a closing quotation mark and a word.
→ There should be a space between a word and an opening quotation mark.
→ There should be no space between an opening quotation mark and its quotation.

The need

How could it be possible to effectively and easily replace the quotes in all those cases?
If possible, I'd also like the solution to be able to "correct" the quotes even if we add them after the typing of the whole sentence.

Note that I don't (can't) use the word delimiter "\b" in a regex because the “accented characters, such as "é" or "ü" are, unfortunately, treated as word breaks.” (source: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions)

Of course, if there is no other solution, I'll come up with a list of what I consider a word delimiter and use it in a regex. But I'd prefer to have a nice working function rather than a list!

Any idea would be appreciated.

Takit Isy
  • 9,688
  • 3
  • 23
  • 47
  • Why don't you just use `replace(/"/g, '”')`? – str Apr 13 '18 at 11:41
  • @str, the `” ` isn't correct for the beginning of a quotation. – Takit Isy Apr 13 '18 at 11:52
  • Can you show some sample input and their sample output? So that all edge cases cane be handled? – Tarun Lalwani Apr 18 '18 at 15:54
  • @TarunLalwani, I added some examples. Beware, french inside! – Takit Isy Apr 18 '18 at 16:37
  • @TakitIsy, what happens when someone types `"tarun lalwani"` ? Does it also become `“tarun lalwani”`? – Tarun Lalwani Apr 18 '18 at 16:41
  • @TarunLalwani, Yes, that's it. You can try in my snippet if you put spaces before/after your quotes. These spaces are the "necessity" I want to get rid of. – Takit Isy Apr 18 '18 at 16:44
  • @TakitIsy, please check the answer I just posted, I think that should do the job – Tarun Lalwani Apr 18 '18 at 16:57
  • what is expected behavior for uneven double quotes? should `"sd"f"` be turned into something at all (maybe `“sd"f”` or `“sd”f"`, etc) or should it just wait for a matching `"`? please post examples of desired behavior in this situation – Scaramouche Apr 18 '18 at 20:37
  • @Scaramouche I added those examples. – Takit Isy Apr 19 '18 at 07:00
  • hi, saw your edit, just one more question: in the example `"Some text "some text" some text." and "Note the space here !"` the desired output is `“Some text “some text” some text.” and “Note the space here !”`. it could as easily be this instead: `“Some text ”some text“ some text.” and “Note the space here !”`, is this last acceptable too, if not, this is a tricky one, do you already have a desired criterion to follow in this situation? – Scaramouche Apr 19 '18 at 12:56
  • @Scaramouche `“Some text ”some text“ some text.”` isn't correct because you shouldn't have a `space` directly inside the opening or closing quotation mark, and also you should have a `space` between a word and the opening, and after the closing and a word. I'll add it in my examples. :) – Takit Isy Apr 19 '18 at 13:55
  • when you say *directly inside the opening or closing quotation mark* you mean *directly after/before the opening/closing quotation mark correspondingly*, right? – Scaramouche Apr 19 '18 at 13:58
  • @Scaramouche, that's it. – Takit Isy Apr 19 '18 at 14:05
  • What happens when you want to quote a blank `"this is a blank "", so is this " ", etc ..."` ? –  Apr 24 '18 at 15:34
  • What happens when there is a odd number of quotes `"odd number " of quotes"` ? –  Apr 24 '18 at 15:35
  • I started to work on this until I realized, there is no rules you could write that will cover all cases. The premise is totally wrong ! –  Apr 24 '18 at 15:36
  • Well, @sln, I don't consider blanks are quotations. And the odd number of quotes is one of the examples `“Inco"rrect” quotes are not replaced.`. Anyway, in my solution I don't count the quotes. `“Inco"rr"ect”` is incorrect too! I think the rules of why it is incorrect work in all the cases you said. I edited to add incorrect quotes examples. – Takit Isy Apr 24 '18 at 17:37
  • Hello, I know this post is old, but I think you have to just replace your regex with this one : this.value = (" " + this.value + " ").replace(/ "/g, ' «'); this.value = this.value.replace(/" /g, '» ').trim(); (I prefer french quotes ;-) ). Adding blank spaces at the beginning and at the end of the string, then trim it after applying the regex solves most of the problem. Anyway, that's what I'm using in this specific case and it works. – Chrysotribax Aug 20 '21 at 12:08
  • Of course, it works on single line text. So, if you have to do this on multi line text, I think you just have to split it with : values = value.split(/\r?\n/); then apply the regexes in a loop. Then join the elements with "join()" to get the corrected text. – Chrysotribax Aug 20 '21 at 12:38

3 Answers3

4

It is working for many of the cases, at the exception of when the "word" is at the very beginning or the very end of a sentence or a line.

To solve that problem, you can use an alternation of a beginning/end of line assertion and the space, capture that, and use it in the replacement:

this.value = this.value.replace(/(^| )"/g, '$1“');
this.value = this.value.replace(/"($| )/g, '”$1');

The alternation is ^| / $|. The capture group will be "" if it matched the assertion, or " " if it matched the sapce.

$('#myInput').on("keyup", function(e) {
  this.value = this.value.replace(/'/g, '’');
  // The below doesn't work when there's no space before or after.
  this.value = this.value.replace(/(^| )"/g, '$1“');
  this.value = this.value.replace(/"($| )/g, '”$1');
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<textarea id="myInput"></textarea>

However, you've said you want to avoid "escaping" characters on user input. I'm not sure where you're planning to use it, but something like the above is almost never the approach to use to a problem with that sort of description.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    When starting to type `"` at the beginning of the line, it adds an unecessary space character. It's the same for the end. `"word".` turns into ` “word” .` – Takit Isy Apr 13 '18 at 11:55
  • You're right, I removed the part where I was talking about "escaping characters". – Takit Isy Apr 13 '18 at 11:57
  • @TakitIsy: Sorry, wasn't paying enough attention to the replacement. I've fixed that now in the answer. – T.J. Crowder Apr 13 '18 at 12:02
  • Thanks for the update. Here is another one: when I type `word,` and then want to add the marks, the one right before the `,` isn't replaced. I am thinking about another approach: do you know if we can “detect” the beginning and end of a word easily ? (That way we would avoid putting all the punctuation marks in the regex) – Takit Isy Apr 13 '18 at 12:04
  • @TakitIsy: See [MDN's regex documentation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions), `\b` is supposed to assert a word boundary, although it considers the boundary between the `d` and `-` in `word-break` to be a boundary. – T.J. Crowder Apr 13 '18 at 12:12
  • Thanks for the link, but I don't (can't) use "\b" because the “accented characters, such as "é" or "ü" are, unfortunately, treated as word breaks.” – Takit Isy Apr 13 '18 at 12:17
  • @TakitIsy: I'm afraid you'll have to come up with a list of what you want to consider a word delimiter and use it in the alternation, probably as a class. E.g. `(^|[ ,.?])` for space, `.`, `,`, or `?`. Your list will likely be long. Or of course, you can list things you *don't* consider word breaks with a negated class. – T.J. Crowder Apr 13 '18 at 12:38
  • Hello T.J., I just answered myself. As I am not a RegEx expert, could you review my answer to maybe comment if something can be enhanced? I'll greatly appreciate. :) – Takit Isy Apr 22 '18 at 14:03
1

I got a solution that finally fits all my needs.
I admit it is a lot more complicated than T.J.'s one, which can be perfect for simple cases.

Remember, my main problem was the impossilibity to use \b because of the accented characters.
I was able to get rid of that issue by using the solution from this topic:
Remove accents/diacritics in a string in JavaScript

After that, I used a modified function highly inspired from the answer here…
How do I replace a character at a particular index in JavaScript?

… and had a very hard time, playing a lot with RegEx to finally get to that solution:

var str_orig = `· I'm "happy" ! Ça y est, j'ai "osé", et mon "âme sœur" était au rendez-vous…
· The sign says: "Some text "some text" some text." and "Note the space here !"
⋅ "Inc"or"rect" quo"tes should " not be replaced.
· I said: "If it works on 'singles' too, I'd love it even more!"
word1" word2"
word1 word2"
"word1 word2
"word1" word2
"word1" word2"
"word1 word2"`;

// Thanks, exactly what I needed!
var str_norm = str_orig.normalize('NFD').replace(/[\u0300-\u036f]/g, '');

// Thanks for inspiration
String.prototype.replaceQuoteAt = function(index, shift) {
  const replacers = "“‘”’";
  var offset = 1 * (this[index] == "'") + 2 * (shift);
  return this.substr(0, index) + replacers[offset] + this.substr(index + 1);
}

// Opening quote: not after a boundary, not before a space or at the end
var re_start = /(?!\b)["'](?!(\s|$))/gi;
while ((match = re_start.exec(str_norm)) != null) {
  str_orig = str_orig.replaceQuoteAt(match.index, false);
}

// Closing quote: not at the beginning or after a space, not before a boundary
var re_end = /(?<!(^|\s))["'](?!\b)/gi;
while ((match = re_end.exec(str_norm)) != null) {
  str_orig = str_orig.replaceQuoteAt(match.index, true);
}

console.log("Corrected: \n", str_orig);

And below is a snippet of a working example with a textarea.
I've just created a function of the code of the first snippet, and I'm using a substring around the caret position to filter the calling of the function (that avoids calling it on every character input):

String.prototype.replaceQuoteAt = function(index, offset) {
  const replacers = "“‘”’";
  var i = 2 * (offset) + 1 * (this[index] == "'");
  return this.substr(0, index) + replacers[i] + this.substr(index + 1);
}

function replaceQuotes(str) {
  var str_norm = str.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
  var re_quote_start = /(?!\b)["'](?!(\s|$))/gi;
  while ((match = re_quote_start.exec(str_norm)) != null) {
    str = str.replaceQuoteAt(match.index, false);
  }
  var re_quote_end = /(?<!(^|\s))["'](?!\b)./gi;
  while ((match = re_quote_end.exec(str_norm)) != null) {
    str = str.replaceQuoteAt(match.index, true);
  }
  return str;
}

var pasted = 0;
document.getElementById("myInput").onpaste = function(e) {
  pasted = 1;
}

document.getElementById("myInput").oninput = function(e) {
  var caretPos = this.selectionStart; // Gets caret position
  var chars = this.value.substring(caretPos - 2, caretPos + 1); // Gets 2 chars before caret (just typed and the one before), and 1 char just after
  if (pasted || chars.includes(`"`) || chars.includes(`'`)) { // Filters the calling of the function
    this.value = replaceQuotes(this.value); // Calls the function
    if (pasted) {
      pasted = 0;
    } else {
      this.setSelectionRange(caretPos, caretPos); // Restores caret position
    }
  }
}
#myInput {
  width: 90%;
  height: 100px;
}
<textarea id="myInput"></textarea>

It seems to work with all I can imagine right now.
The function correctly replaces the quotes when:
⋅ typing regularly,
⋅ adding quotes after we typed the text,
⋅ pasting text.

It replaces both the double and the singles quotes.

Anyway, as I am not a RegEx expert at all, please feel free to comment if you notice a behaviour that may be unwanted, or a way to improve the expressions.

Takit Isy
  • 9,688
  • 3
  • 23
  • 47
  • The code looks good, I would just run it through all the test like I have in my answers, to make sure it covers all your scenarios and works flawlessly. Also it will make sure any changes you make will not break it for any of the existing test cases – Tarun Lalwani Apr 22 '18 at 14:10
  • @TarunLalwani Thanks for your comment. :) – Takit Isy Apr 22 '18 at 14:20
0

So instead of following a regex replace approach, I would use a simple looping with a quotes balancing act. You assume the every single quote that appears will match with another one and when it does it will be replaced as pairs.

Below is a test implementation for the same

String.prototype.replaceAt=function(index, replacement) {
return this.substr(0, index) + replacement+ this.substr(index + replacement.length);
}

tests  =[
// [`I'm "happy"! J'ai enfin "osé". La rencontre de mon "âme-sœur" a "été" au rendez-vous…
// and how it should look after correction:`, `I'm "happy"! J'ai enfin "osé". La rencontre de mon "âme-sœur" a "été" au rendez-vous…
// and how it should look after correction:`],
[`tarun" lalwani"`, `tarun” lalwani”`],
[`tarun lalwani"`, `tarun lalwani”`],
[`"tarun lalwani`,`“tarun lalwani`],
[`"tarun" lalwani`,`“tarun” lalwani`],
[`"tarun" lalwani"`,`“tarun” lalwani”`],
[`"tarun lalwani"`, `“tarun lalwani”`]
]

function isCharacterSeparator(value) {
return /“, /.test(value)
}

for ([data, output] of tests) {
let qt = "“”"
let qtL = '“'
let qtR = '”'
let bal = 0
let pattern = /["“”]/g
let data_new = data
while (match = pattern.exec(data)) {
    if (bal == 0) {
        if (match.index == 0) {
            data_new = data_new.replaceAt(match.index, qt[bal]);
            bal = 1
        } else {
            if (isCharacterSeparator(data_new[match.index-1])) {
                data_new = data_new.replaceAt(match.index, qtL);
            } else {
                data_new = data_new.replaceAt(match.index, qtR);
            }
        }
    } else {
        if (match.index == data.length - 1) {
            data_new = data_new.replaceAt(match.index, qtR);
        } else if (isCharacterSeparator(data_new[match.index-1])) {
            if (isCharacterSeparator(data_new[match.index+1])) {
                //previous is separator as well as next one too
                // "tarun " lalwani"
                // take a call what needs to be done here?

            } else {
                data_new = data_new.replaceAt(match.index, qtL);
            }
        } else {
            if (isCharacterSeparator(data_new[match.index+1])) {
                data_new = data_new.replaceAt(match.index, qtL);
            } else {
                data_new = data_new.replaceAt(match.index, qtR);
            }
        }
    }


}

console.log(data_new)
if (data_new != output) {
  console.log(`Failed to parse '${data}' Actual='${data_new}' Expected='${output}'`)
} ;
}

Update-1: 20-Apr-2018

I have updated the function. There still may be some edge cases, but you should put everything in the test and run it and fix the ones that don't behave as expected

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • Funny, I've just found the topic where `String.prototype.replaceAt` was a solution. :) (https://stackoverflow.com/questions/1431094/how-do-i-replace-a-character-at-a-particular-index-in-javascript) ⋅⋅⋅ And I was trying to do something with it. – Takit Isy Apr 18 '18 at 17:21
  • In your example you put `"tarun lalwani`, but if you put the quote at the end `tarun lalwani"`, the wrong correction appears! – Takit Isy Apr 18 '18 at 18:14
  • That depends on how you take it. I am might by typing `tarun lalwani"quote"`. You need to decide what you need to do in such cases. If `bal==1` at the end then you can check `match.index` and do an additional replace of the unbalanced qoute – Tarun Lalwani Apr 18 '18 at 18:23
  • And that is another reason I wanted you to post expected inputs vs outputs. There will be cases like `tarun" lalwani"` and what is the expected output now? You should work at a list of all possible cases you want to handle and then work your way back to the solution, more like TDD approach – Tarun Lalwani Apr 18 '18 at 18:39
  • `tarun" lalwani"` isn't a correct way to write, but the result should be `tarun” lalwani”`, as the quotes are at the end of the word. I'm gonna add that as an example. (I've only put “correct” ways of use at the moment) – Takit Isy Apr 18 '18 at 18:44
  • @TakitIsy, updated the code. I don't think it will match 100% of your requirement but you have branches every where to make a fix, you should be easily be able to adapt this to your edge cases and make it working in all possible conditions – Tarun Lalwani Apr 20 '18 at 04:50
  • Hello Tarun, as I just answered myself, I was wondering if you can review my answer. Maybe you will got any idea to enhance the code or the RegEx expressions (I am not an expert at all at RegEx). I'll appreciate that. :) (My answer uses the `String.prototype.replaceAt` too!) – Takit Isy Apr 22 '18 at 14:06