-1

On web page like this

 <div>text text</div> |text text 55.555555 |44.444444 | <div>text <b>name</b></div>

I need to get array like this

{ [55.555555 , 44.444444, "name"] , [ ... , ... , ... ], ... } 

I would like to use regular expressions to achieve the - find coordinates part, but I don't know how to write this part:

return all text parts which match this expression

Can you help me with some ideas / functions?

UPDATE

I found nativeTreeWalker function here get all text nodes / SO and I changed this function to look for 2 numbers and a text. This pretty much works, but I still have a bug, it returns even numbers like 1234 .. with no decimal.

function nativeTreeWalker() {
    var walker = document.createTreeWalker(
        document.body, 
        NodeFilter.SHOW_TEXT, 
        null, 
        false
    );

    var node;
    var textNodes = [];
    var name = false;
    var elem = null;

    while(node = walker.nextNode()) {

        if (name){ elem.push(node.nodeValue); textNodes.push(elem); console.log(elem); name = false; }
        else { elem = null; }

        elem = node.nodeValue.match(/\d{2}.\d+/g);
        if (elem!=null){ name=true; } 

    }
}

nativeTreeWalker()
Community
  • 1
  • 1
Oriesok Vlassky
  • 797
  • 1
  • 13
  • 26

3 Answers3

1

Ok, so this is my solution ..

function nativeTreeWalker() {
    var walker = document.createTreeWalker(
        document.body, 
        NodeFilter.SHOW_TEXT, 
        null, 
        false
    );

    var node;
    var textNodes = [];
    var name = false;
    var elem = null;

    while(node = walker.nextNode()) {

        if (name){ elem.push(node.nodeValue); textNodes.push(elem); console.log(elem); name = false; }
        else { elem = null; }

        elem = node.nodeValue.match(/\d{2}.\d+/g);
        if (elem!=null){ name=true; } 

    }
}

nativeTreeWalker()
Oriesok Vlassky
  • 797
  • 1
  • 13
  • 26
0

If you're sure the input format of your data can't change, this regex should suit your needs:

[|].*?([+-]?\d+[.]\d+).*?[|].*?([+-]?\d+[.]\d+).*?[|].*?<b>(.*?)</b>

The first group ($1) contains the first coordinate, the second one ($2) the second coordinate, and the third one ($3) the name.

Here is a demo to show you how you could use it with JavaScript.

sp00m
  • 47,968
  • 31
  • 142
  • 252
  • thanks, but the fiddle never loads (possibly wrong link) and the regex showed me some syntax error when I tried to use it – Oriesok Vlassky Feb 25 '13 at 11:31
  • @OriesokVlassky It's because in JavaScript, you need to escape the `/` char, so replace the final `` with `<\/b>`. And jsFiddle seems to be broken for now. – sp00m Feb 25 '13 at 12:26
0

In your update, the reason that your function returns non-decimal numbers is that you haven't escaped the ., so it is being interpreted as the wildcard meta-character matching any character but a newline. To only match decimals, the regular expression in

elem = node.nodeValue.match(/\d{2}.\d+/g);

should be /\d{2}\.\d+/g.

Using a TreeWalker seems like a good idea, so please post your final code as an answer when you manage to use it to create an array in the form you request in your question, i.e. with separate arrays of coordinates and their associated name.

MikeM
  • 13,156
  • 2
  • 34
  • 47