Remove characters from a string that are not firstname/surname characters

Question

Please see the code below:

 @HostListener('paste', ['$event'])
  onPaste(event) {
    var test = event.clipboardData.getData('text');
    var removedNumbers = test.replace(/[0-9]/g, '');
  }

Numbers are removed from the pasted text. It is a surname field, so should also exclude characters like {[}] etc.

How can I remove characters that are not valid for a name? I have read lots of simlar questions today like this one: how do i block or restrict special characters from input fields with jquery?. However, I have not found an answer to my specific question.

[Falsehoods programmers believe about names](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/) - worth a read. Here is how you remove invalid characters `test.replace(/./g, "$&")` — VLAZ, Feb 05 '20 at 15:12
@Pointy, I asked a question about pasting into a textbox earlier. — w0051977, Feb 05 '20 at 15:12

Addis · Accepted Answer · 2020-02-06T19:05:46.570

[^ ] matches anything(including space) that is not enclosed in the brackets, so you could place all characters you don't want to be removed inside the bracket. Note, however, that you have to escape special characters if they are part of the match. Also note that

you can specify a range of characters by using a hyphen, but if the hyphen appears as the first or last character enclosed in the square brackets it is taken as a literal hyphen to be included in the character set as a normal character.

const regex = /[^a-z,' -]/gi;

console.log("Conan O'Brien".replace(regex, ''));

You may also use Unicode character ranges for non-English names, for example

for Chines 4e00 to 9fa5,
for most of Latin 0061 to 007A & 00DF to 00F6 & 00F8 to 01BF & 01C4 to 024F
for Geʽez 1200 to 135A

const regexLatin = /[^\u0061-\u007A\u00DF-\u00F6\u00F8-\u01BF\u01C4-\u024F ]/gui;
const regexChina = /[^\u4e00-\u9fa5 ]/gui;
const regexGeez = /[^\u1200-\u137F ]/gui;

console.log("Björk Guðmundsdóttir".replace(regexLatin, ''));
console.log("陳港生".replace(regexChina, ''));
console.log("ምኒልክ".replace(regexGeez, ''));

However, this is not an exhaustive list, you may refer to the List_of_Unicode_characters to make adjustments for your specific need.

Trying to match all names from 'all' languages could be very hard. The good news, however, is that Unicode_Property_Escapes are part of the ECMAScript 2020 Specification( currently on draft stage ) which will simplify the process a lot. For example to match for Latin characters, you would use: /\p{Script=Latin}/u, and to match for letters from 'all' languages, you would use: /\p{Letter}/gu or the short form /\p{L}/gu

So, names in Cyrillic, or Chinese, or Japanese, or some in Spanish or German are not names? "毛泽东" will find it insulting to be called `""`. Let's look at something closer, perhaps: "Борис Николаевич Ельцин". Or as you'd call him: `" "` (this is actually two spaces but SO compresses to one). Maybe something that uses a western alphabet? I'm pretty sure "Björk Guðmundsdóttir" will also not appreciate being called `"Bjrk Gumundsdttir"`. And even a good normal and even fairly common name like "Conan O'Brien" is apparently not valid, but `"Conan OBrien"` is. — VLAZ, Feb 06 '20 at 09:26
@VLAZ, thanks for the comment. Though I think it's difficult to give a complete answer, I have updated my answer to reflect some of your suggestions. — Addis, Feb 06 '20 at 19:14

score 1 · Answer 2 · edited Feb 06 '20 at 10:16

Try this.

Vanilla Javascript

document.addEventListener("paste", event => {
    event.preventDefault();
    let clipboardData = event.clipboardData.getData("Text");
    clipboardData = clipboardData.replace(/[0-9_!¡?÷?¿/\\+=@#$%\ˆ&*(){}|~<>;:[\]]/g, "");
    let allowedPasteTarget = ['textarea', 'text']
    if (allowedPasteTarget.includes(document.activeElement.type)) {
        let prevText = document.activeElement.value;
        document.activeElement.value = prevText + clipboardData;
    }
});

//To handle the copy button, [Optional]
document
    .getElementById("copy-text")
    .addEventListener("click", function(e) {
        e.preventDefault();
        document.getElementById("text-to-copy").select();
        var copied;
        try {
            copied = document.execCommand("copy");
        } catch (ex) {
            copied = false;
        }
        if (copied) {
            document.getElementById("copied-text").style.display = "block";
        }
    });

<div>
    <input type="text" id="text-to-copy" placeholder="Enter text" />
    <button id="copy-text">Copy</button>
    <span id="copied-text" style="display: none;">Copied!</span>
</div>
<div>
    <textarea name="paste-area" id="paste-area" cols="30" rows="10" placeholder="Paste it here"></textarea>
</div>

Angular

@HostListener('paste', ['$event'])
onPaste(event) {
  var test = event.clipboardData.getData('text');
  var removedNumbers = test.replace(/[0-9_!¡?÷?¿/\\+=@#$%\ˆ&*(){}|~<>;:[\]]/g, '');
  let allowedPasteTarget = ['textarea', 'text']
    if (allowedPasteTaeget.includes(document.activeElement.type)) {
        let prevText = document.activeElement.value;
        document.activeElement.value = prevText + clipboardData;
    }
}

So, did some test with input -> output. Let's see 1. `毛泽东` -> *nothing*; 2. `Борис Николаевич Ельцин` -> ` ` (two spaces); 3. `Björk Guðmundsdóttir` -> `Bjrk Gumundsdttir`; 4. `Conan O'Brien` -> `Conan O'Brien` this one works! — VLAZ, Feb 06 '20 at 09:34

Remove characters from a string that are not firstname/surname characters

2 Answers2