0

I'm struggling with a regular expression. I'd like to strip out all data- attributes from html elements using regular expression. For example let's say I have this text:

<a href="" data-foo data-foo-bar data-test="foo" data-foo='blah'>
  testing data-foo attributes.
</a>

I'd like to remove all data-* if it's inside of an HTML tag, but not from the text. The result should be:

<a href="">
  testing data-foo attributes.
</a>

This is what I have, but it inappropriately strips out data-* from text:

/(data-.+?=".*?")|(data-.+?='.*?')|(data-[a-zA-Z0-9-]+)/g
Johnny Oshika
  • 54,741
  • 40
  • 181
  • 275
  • 1
    http://stackoverflow.com/a/1732454/1848654 – melpomene Sep 10 '15 at 18:51
  • which language are you using? – Toni Leigh Sep 10 '15 at 19:09
  • @ToniLeigh: JavaScript, but this problem isn't language specific. – Johnny Oshika Sep 10 '15 at 21:58
  • @JohnnyOshika I ask for two reasons, firstly the syntax of application can vary between languages (small point); secondly though, depending on language there may well be a better way to manipulate HTML depending on the languages DOM functions and inbuilt HTML parsing functions - regex manipulation of HTML strings is notoriously difficult – Toni Leigh Sep 11 '15 at 06:34
  • @ToniLeigh: It looks like regex is a difficult way to solve this, so I've had to resort to string parsing and iterating. – Johnny Oshika Sep 11 '15 at 07:11

1 Answers1

0

Here is what you can do to delete all data-* attributes.

// select all elements. Or just specific ones.
var elements = document.getElementsByTagName("*");

// use dataset api to delete all properties.
for (var i = 0; i < elements.length; i++) {
    for (var prop in elements[i].dataset) delete elements[i].dataset[prop];
}
<a href="" data-foo data-foo-bar data-test="foo" data-foo='blah'>
  testing data-foo attributes.
</a>
DavidDomain
  • 14,976
  • 4
  • 42
  • 50
  • Thank you. That's probably a good workable solution if I had access to the DOM, but I don't. I'm looking to do some string manipulation. – Johnny Oshika Sep 10 '15 at 21:59
  • Oh, OK. That's a totally different story than, sorry. – DavidDomain Sep 10 '15 at 22:04
  • 1
    I'm a real noob when it comes to regex, but would this work for you [https://regex101.com/r/mC6wM6/2](https://regex101.com/r/mC6wM6/2) ? I'm not even sure if that is the correct way to do it. – DavidDomain Sep 10 '15 at 23:27
  • @JohnnyOshika - if you can run JavaScript on the HTML string then you have access to the DOM – Toni Leigh Sep 11 '15 at 06:36
  • @ToniLeigh - Not really, if you don't know the environment or the context in which the OP wants to do the string manipulation, you can't just assume that there is a DOM. The DOM is not part of the JavaScript language. – DavidDomain Sep 11 '15 at 06:58
  • @DavidDomain: Your regex attempt is pretty good, but there are some scenarios where it doesn't work, such as and . – Johnny Oshika Sep 11 '15 at 07:07
  • @DavidDomain, true, but it can convert a string into DOM – Toni Leigh Sep 11 '15 at 07:35
  • its not using regex as asked in question – sairfan May 27 '20 at 19:25