2

I want to find string after # , I have a problem if #a or #a&nbsp;<div>.. are both working to return a, but if #a<div>.. will return a<div>.

how to avoid if following string is <div> or <br> or <p> than just break, e.g
#a<div>bc - a

https://regex101.com/r/xD1vN0/1

var re = /#([^#\s]*)/g;
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
user1575921
  • 1,078
  • 1
  • 16
  • 29
  • You mea `/#([^<#\s]*)/g`? Or should it be really `
    `, a literal? Like, if you have `#a`, it would be accepted?
    – Wiktor Stribiżew May 05 '16 at 10:34
  • yes I want only
    ,
    ,

    – user1575921 May 05 '16 at 10:37
  • Try [`#((?:(?!<(?:div|br|p)>)[^#\s])*)`](https://regex101.com/r/xI2wY9/3) – Wiktor Stribiżew May 05 '16 at 10:37
  • do you want to leave a answer – user1575921 May 05 '16 at 10:39
  • I posted with an explanation. – Wiktor Stribiżew May 05 '16 at 10:42
  • Don't use regex for getting html tags' attributes and contents. Use a parser and parse the html! – Ram May 05 '16 at 10:42
  • @WiktorStribiżew thanks for explanation – user1575921 May 05 '16 at 11:44
  • @Vohuman why can't use regex?? I was using https://www.npmjs.com/package/cheerio parse html content then find #tag but the input string is from contenteditable element so it will generate string like `#a
    bc
    #d` (only div, br, p html tag I guess) and after parse by cheerio text method `#abc #d`. or you have some example solution?? I want to doing on server side nodejs javascript, don't wanna on client side
    – user1575921 May 05 '16 at 11:50
  • base on @Vohuman comment, I found http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la, but I think in this case string will not contain whole html tags only some generate from contenteditable element, so thats why I use regex doing this. any idea? – user1575921 May 05 '16 at 12:02

2 Answers2

3

You can use a regex with a tempered greedy token:

/#((?:(?!<(?:div|br|p)>)[^#\s])*)/g

The (?:(?!<(?:div|br|p)>)[^#\s])* is a tempered greedy token that matches any character other than # and whitespace that do not start a sequence of either <div>, <br>, or <p>.

JS demo:

var re = /#((?:(?!<(?:div|br|p)>)[^#\s])*)/g; 
var str = `#a<div>
#b<br>
#c<p>
#d<hi>`;
var res = [];
 
while ((m = re.exec(str)) !== null) {
    res.push(m[1]);
}
document.body.innerHTML = "<pre>" + JSON.stringify(res.map(x => x.replace(/</g,"&lt;").replace(/</g,"&gt;")), 0, 4) + "</pre>";
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You can use this pattern to break if a < is reached

var re = /#([^#\s^<]*)/g;
Simon Schüpbach
  • 2,625
  • 2
  • 13
  • 26