-1

I have this string:

   s='data-id="a1429883480588" class="privateMessage" @zaza
    data-id="a1429883480589" class="privateMessage" @zaza2
    data-id="a1429883480598" class="privateMessage" @zaza3'

My goal is to capture the what's between : data-id=" and " to have results: [a1429883480588, a1429883480589, a1429883480598]

I tried with

var splitted = s.match(/data-id="(\w)+(?=")/g)

But this also captures data-id=" and "

Any idea on how to write this regex ?

It must be done with JS since it is nodeJS function !

yarek
  • 11,278
  • 30
  • 120
  • 219
  • 3
    Why regex? Using a document fragment would most likely be safer – MDEV Apr 24 '15 at 16:47
  • 3
    Don't. Use DOM manipulation. – Matt Burland Apr 24 '15 at 16:47
  • 2
    Don't parse HTML with RegEx: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Cory Danielson Apr 24 '15 at 16:49
  • IT IS inside NODEJS ! – yarek Apr 24 '15 at 16:49
  • 1
    You can use Cheerio to parse the DOM in node. – Cory Danielson Apr 24 '15 at 16:50
  • I guess, but really need a regex instead of using a module... – yarek Apr 24 '15 at 16:51
  • 1
    You really don't want to use regex for this. – Kyle Falconer Apr 24 '15 at 16:52
  • 1
    @yarek: while the comments to which you're responding may appear glib, and offered without relevance to your specific problem, they're right: parsing an irregular language with JavaScript's regular expression (or any other language's regex) is a minefield with more edge cases than you'll ever catch or plan for. It really is a fool's errand. Use whatever DOM parsing tools that are available for your environment, and save yourself the frustration. – David Thomas Apr 24 '15 at 16:52
  • I know that perfectly.. But I really need a regex for this very specific case. – yarek Apr 24 '15 at 16:54
  • You *really* don't. Find another solution. – Kyle Falconer Apr 24 '15 at 16:54
  • A better solution would be to get those values you need passed to you from where ever the source is. If it's a script on a page, have that script parse the HTML before sending it. – Kyle Falconer Apr 24 '15 at 16:58
  • I just edit the question and removed the HTML part so answers should not be obsessed with DOM parsing but a simple regex expression. – yarek Apr 24 '15 at 17:01

2 Answers2

1

If you're happy that the string will always be well formed and not mangled up. Here's one that'll do it:

var s = '<span data-id="a1429883480588" class="privateMessage">@zaza</span>&nbsp;';
s += '<span data-id="a1429883480589" class="privateMessage">@zaza2</span>&nbsp;';
s += '<span data-id="a1429883480598" class="privateMessage">@zaza3</span>';

s.match(/data-id="\w+"/g).map(function(attributeAndValue) {
    return attributeAndValue.split('"')[1];
})

The concerns raised above about using RegEx to parse HTML are valid but more for HTML in the wild.

Adrian Lynch
  • 8,237
  • 2
  • 32
  • 40
1

Here's the cheerio equivalent, just for reference or whatever

var cheerio = require('cheerio');

var markup = '<span data-id="a1429883480588" class="privateMessage">@zaza</span>&nbsp;<span data-id="a1429883480589" class="privateMessage">@zaza2</span>&nbsp;<span data-id="a1429883480598" class="privateMessage">@zaza3</span>';
var $ = cheerio.load('<div>'+markup+'</div>');
var ids = Array.prototype.map.call($('[data-id]'), function(e) {
    return $(e).attr('data-id');
});

console.log(ids);
// [ 'a1429883480588', 'a1429883480589', 'a1429883480598' ]
Cory Danielson
  • 14,314
  • 3
  • 44
  • 51