0

Ok, I am trying to find a the dom pattern:

 <div>
    <br>
    </div>

from my contenteditable div which typically looks like this, with multiple spans:

<div id="edit" contenteditable="true">
    <span>text</span>   
    <span>text</span> 
    //and more spans maybe 
     <div>
        <br>
     </div>
</div>

The line of code that I am using is:

return string.split(/\r\n?|\n|<div>(.*?)<br>(.*?)<\/div>,gis/);

The problem is this portion of the regex <div>(.*?)<br>(.*?)<\/div>,gis.. it never matches, even though the pattern exists. Just for clarity sake, the return runs in a loop across the input text, triggered by the input change event on my contenteditable div. I need an array version of the text delimited every where the pattern is. No library for this please.

cube
  • 1,774
  • 4
  • 23
  • 33
  • 4
    If you try processing HTML with RegExp, you're going to have a bad time. Would using other JavaScript methods (that do DOM traversal) be an acceptable solution to you? – Benjamin Gruenbaum Mar 19 '13 at 00:58
  • A good site to test your regex online is: http://regexpal.com/ Paste your regex there and you will see what is not correct since it has an "intelli sense" – Tiago B Mar 19 '13 at 01:01
  • @TiagoBrenck Already tested it at that exact site as well as others, and it works. But does not in my loop. – cube Mar 19 '13 at 01:11
  • @Benjamin Gruenbaum, I didn't expect that I would need to use jQuery for such a simple task. – cube Mar 19 '13 at 01:20
  • @cube Who have said anything about jQuery? I'm talking about vanilla JavaScript. – Benjamin Gruenbaum Mar 19 '13 at 01:21
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Derek 朕會功夫 Mar 19 '13 at 01:30
  • @Benjamin Gruenbaum, Sorry for the assumption. In my mind (because I glanced over it) thought is read other library. To answer your question... any solution would be acceptable at this point. Just figured this one liner would have done it for me. – cube Mar 19 '13 at 01:41
  • A one liner can get very un-debuggable. Dom traversal on the other hand makes semantic sense. Regular Expressions when trying to parse a language that isn't regular has a lot of edge cases one does not think of in advance, I have been bitten by this more than once. I've added an answer that uses DOM traversal, let me know what you think – Benjamin Gruenbaum Mar 19 '13 at 01:43

4 Answers4

1

Here is a solution that does not involve any external library and is easy to understand.

For starters, let's grab the edit div's contents

var $edit = document.getElementById("edit")

Now, we create a small function to iterate through our DOM. There are plenty of ways to do this, here is the way Douglas Crockford did it in his book "JavaScript : The Good Parts" iirc:

function walkTheDOM(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walkTheDOM(node, func);
        node = node.nextSibling;
    }
}

This functions goes through every element in the dom of node and runs func on it.

The only thing remaining is to call it on our $edit div from before.

walkTheDOM($edit, function (node) {
    if (node.nodeName.toLowerCase()==="div") { // we got a div
        if(node.innerHTML.trim() === "<br>"){ //whose inner html is <br>
           console.log("GOT",node);//print its name
        }
    }
});

Here is a fiddle of it all working

After you've done all the work of finding it, you can easily extract whichever text/data you want from the rest of the data. See this question on why parsing HTML with regular expressions is generally a bad idea.

Community
  • 1
  • 1
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
0

The flags should go outside:

return string.split(/\r\n?|\n|<div>(.*?)<br>(.*?)<\/div>/gis);

I'm not very good with regex, but that seems too greedy to me also. I believe it will match any div that contains a br, not only the ones that just contain a br. And if they are nested, it should match the outermost one. I'd tackle this problem by traversing the DOM, as suggested in the comments.

bfavaretto
  • 71,580
  • 16
  • 111
  • 150
0

I see a few potential issues: (1) You want your flags (gis) outside of the // marks. (2) Your first use of | needs parentheses to match \r, \n or \r\n. You probably don't need these at all though. (3) I'm not sure why you have an alternate here: \n|<div>. (4) s isn't a flag that I'm aware of.

This should do the trick:

/<div>(.*?)<br>(.*?)<\/div>/gi
ZachB
  • 13,051
  • 4
  • 61
  • 89
0

1) Regexp flags should be after closing "/"

2) Use [\S\s]* instead of .*

3) "<text" is erroneous html code because "<" should be replaced by "&lt;"