3

I have function that is supposed to "clean" a string and i'd like to use replace() to do that, but I can't figure out why the following code is not working when the text comes from an input[text].

for instance :

console.log(getCleanText("ééé")); // works fine, it displays : eee

but

// my_id is an input with type="text"
var my_text = document.getElementById("my_id").value 
console.log(getCleanText(my_text)); // doesn't work at all, it displays : ééé

the function code is :

function getCleanText(some_text) {
    var clean_text = some_text.toLowerCase();
    clean_text = clean_text.replace("é", "e"); 
    clean_text = clean_text.split("é").join("e"); // give it another try

    return clean_text;
}

any idea ?

Tobias
  • 7,723
  • 1
  • 27
  • 44
Banibal
  • 89
  • 1
  • 10

5 Answers5

4

I'm willing to bet your problem lies in a misunderstanding of Unicode.

é 
é

Those two characters above are two different characters. The first is the letter e, with an accent character (U+0301). The other is a single character, U+00E9.

You need to ensure you're replacing both versions.

Matt Grande
  • 11,964
  • 6
  • 62
  • 89
2

I think the character "é" from element value is the different from the "é" constant. To resolve that you can take look at the int value of the input.

var inputEValue = document.getElementById("my_id").charCodeAt(0);
var constantEValue = "é".charCodeAt(0);

Then you will be able to detect what characters you are replacing.
If you want to just remove accents from text, take look at the question Remove accents/diacritics in a string in JavaScript

Community
  • 1
  • 1
Marek
  • 486
  • 7
  • 19
1

Try this:

function cleanText(text) {
    var re = new RegExp(/\u0301|\u00e9/g);

    return text.replace(re, "e").toLowerCase();
}

cleanText("éééé")

--

Updated to use the proposed UniCode chars by Matt Grande

Robert Hoffmann
  • 2,366
  • 19
  • 29
  • Hmm -- if the input text consists of "e,acute" then your code will return "ee". You are better off replacing all combining accents with nothing, and then check for precomposed characters. – Jongware Nov 15 '13 at 14:58
1

Try this:

function getCleanText(old_string)
{
    var new_string = old_string.toLowerCase();
    return new_string.replace(/é/g, 'e');
}

Ed: beaten by the Robert. For reference, see here: What are useful JavaScript methods that extends built-in objects?

Community
  • 1
  • 1
Euan T
  • 2,041
  • 3
  • 20
  • 28
0

What is the output of var my_text = document.getElementById("my_id").value; ? Depending on your html, you might need to use other functions to get the data. e.g var my_text = document.getElementById("my_id").innerHTML;

http://jsbin.com/obAmiPe/5/edit?html,js,console,output

pogola
  • 1
  • 1
  • the output is ééé. I tried to replace .value by .innerHtml and .innerText but it gives the same result – Banibal Nov 15 '13 at 15:07