4

When parsing an XML feed, I am getting text from the content tag, like this:

The Government has awarded funding for a major refurbishment project to go ahead at St Eunan’s College. This is in addition to last month’s announcement that grant for its prefabs to be replaced with permanent accomodation. The latest grant will allow for major refurbishment to a section of the school to allow for new accommodation for classes – the project will also involve roof repairs, the installation of a dust extraction system, new science room fittings and installation of firm alarms. Donegal Deputy Joe McHugh says credit must go to the school’s board of management

Is there anyway to easily replace these special characters (i.e., HTML entities) for e.g., apostrophes, etc. with their character equivalents?

EDIT:

Ti.API.info("is this real------------"+win.dataToPass)


returns: (line breaks added for clarity)

[INFO][TiAPI   ( 5437)]  Is this real------------------Police in Strabane are
warning home owners and car owners in the town to be vigilant following a recent
spate of break-ins. There has been a number of thefts from gardens and vehicles
in the Jefferson Court and Carricklynn Avenue area of the town. The PSNI have
said that residents have reported seeing a dark haired male in and around the
area in the early hours of the morning. Local Cllr Karina Carlin has been
monitoring the situation – she says the problem seems to be getting
worse…….


My external.js file is below i.e. the one which merely displays the text above:

var win= Titanium.UI.currentWindow;

Ti.API.info("Is this real------------------"+ win.dataToPass);

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

var newText= unescapeHTML(win.datatoPass);


var label= Titanium.UI.createLabel({
    color: "black",
    //text: win.dataToPass,//this works!
    text:newText,//this is causing an error
    font: "Helvetica",
    fontSize: 50,
    width: "auto",
    height: "auto",
    textAlign: "center"
})

win.add(label);
user2363025
  • 6,365
  • 19
  • 48
  • 89
  • Avoid as in remove them? replace them with their character equivalents? - What do you want to do with the string? – Alex K. Jul 16 '13 at 14:07
  • @ Alex K. Yep replace them with their character equivalents. I am displaying them as text on a window – user2363025 Jul 16 '13 at 14:11
  • @ Alex K. I realise a custom find and replace function could do it but I waas wondering if there was another way as I'd have to know all the possible special characters which could possibly appear – user2363025 Jul 16 '13 at 14:16
  • Is this in a browser? Then you can use a pattern match or by setting a dom members html the reading back its node text; http://stackoverflow.com/questions/4338963/convert-html-character-entities-back-to-regular-text-using-javascript – Alex K. Jul 16 '13 at 14:17
  • @AlexK. No it's not in a browser – user2363025 Jul 16 '13 at 14:28

3 Answers3

6

There are many libraries you can include in Titanium (Underscore.string, string.js that will make this happen, but if you only want the unescape html function, just try this code, adapted from the above libraries

var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };

function unescapeHTML(str) {//modified from underscore.string and string.js
    return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;

        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }
    });
}

This replaces those special characters with their human readable derivatives and returns the modified string. Just put this somewhere in code and your good to go, I have used this myself in Titanium and its quite handy.

Josiah Hester
  • 6,065
  • 1
  • 24
  • 37
  • Thanks very much for your help. I've copied this code in. In order to actually run it, I have tried: var newText= unescape(win.datatoPass); where win.datatoPass is a string. The variable newText was what I then set as the text propert of a label. But it is displaying in my app as undefined – user2363025 Jul 16 '13 at 15:36
  • I've realised, it should have been unescapeHTML(win.datatoPass) I now have tried: var newText= unescapeHTML(win.datatoPass); where win.datatoPass is a string. The variable newText was what I then set as the text property of a label. But my app is saying it cannot call method 'replace'of undefined, the source is 'return str.replace(/\&([^;]+);/g, function(entity, entityCode)'. Do I have to slot in values for entity and entity code? – user2363025 Jul 16 '13 at 15:46
  • That means that `win.datatoPass` is undefined, check `win.datatoPass` there isn't a problem with the function itself. – Josiah Hester Jul 16 '13 at 17:00
  • win.datToPass is definitely not undefined. It was working before I introduced this function and is now if I remove it. As i wrote up above, the first issue was because i wrote unescape(win.dataToPass) and NOT the correct function name i.e. unescapeHTML(win.dataToPass). It's definitely related to this line: return str.replace(/\&([^;]+);/g, function(entity, entityCode)' Do I have to replace entity and entityCode with actual values? Your help would be really appreciated – user2363025 Jul 16 '13 at 22:05
  • NO. Copy the function _including_ the `escapeChars` and it will work As Is... IF you are passing it a String. The [`replace`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) function takes a RegularExpression and a function as input, so don't mess with the function itself. In fact, this function was written by the Underscore guys and has been vetted a ton. Tell me what `Ti.API.info('Is this real: '+win.datatoPass);` prints out if you place it right before where you would call the unescapeHTML function. – Josiah Hester Jul 17 '13 at 01:41
  • I'll edit my question above to show you how I used the function because I think I may be doing that incorrectly. – user2363025 Jul 17 '13 at 08:29
  • Also note I've included the console you requested – user2363025 Jul 17 '13 at 08:47
  • You forgot to capitalize the `t` in dataToPass: check your code... `var newText= unescapeHTML(win.datatoPass);` It should be `var newText= unescapeHTML(win.dataToPass);` – Josiah Hester Jul 17 '13 at 15:16
  • Wow what a stupid mistake not to spot! Thanks a lot for that. Is it easy to just add to the escapeChars array? That couldn't be them all or could it? – user2363025 Jul 17 '13 at 15:24
1

I have encountered same issue, and @Josiah Hester's solution does work for me. I have add a condition to check that only string values are handled.

    this.unescapeHTML = function(str) {
    var escapeChars = { lt: '<', gt: '>', quot: '"', apos: "'", amp: '&' };
    if(typeof(str) !== 'string'){
        return str;
    }else{
        return str.replace(/\&([^;]+);/g, function(entity, entityCode) {
        var match;
        if ( entityCode in escapeChars) {
            return escapeChars[entityCode];
        } else if ( match = entityCode.match(/^#x([\da-fA-F]+)$/)) {
            return String.fromCharCode(parseInt(match[1], 16));
        } else if ( match = entityCode.match(/^#(\d+)$/)) {
            return String.fromCharCode(~~match[1]);
        } else {
            return entity;
        }});
    }
};
Stuart Siegler
  • 1,686
  • 4
  • 30
  • 38
Dino Liu
  • 500
  • 4
  • 17
0

Below are two references to these special characters, unfortunately by filtering them out you may filter out important information that you might actually want to keep. My advice is to use the symbol reference table to create an array and then perform a search in your string for each of the codes and replace the code with it's appropriate response.

For example:

A-Z are represented by: &#65; to &#90;

Filtering out this information may significantly change the data you expect to be reading.

HTML Symbol Entities Reference:
http://www.webmonkey.com/2010/02/special_characters/
http://www.w3schools.com/tags/ref_symbols.asp
Joshua Briefman
  • 3,783
  • 2
  • 22
  • 33
  • @user2363025 The programming language your using may support a library containing a search and replace routine that performs the function I described, this will depend on the language your using and what libraries you may or may not have present on the machine. – Joshua Briefman Jul 16 '13 at 14:19
  • I'm using titanium. Any idea if they have a library like this – user2363025 Jul 16 '13 at 14:34
  • Two options exist. 1) Build a custom function that searches for and repalced each of the codes. 2) It looks like Titanium is derivative of Java, if that is true then the following should work (although you may have to reference the java standard lib, visit the titanium page for a howto on creating that reference: http://docs.oracle.com/javase/6/docs/api/java/net/URLDecoder.html – Joshua Briefman Jul 16 '13 at 14:56