6

I have an html with a lot of id="something" attributes.

All the html is inside $data var.

Trying to remove all the id="*" from $data:

$data = preg_replace('\<id="[*]"^\>', '', $data);

Doesn't work, whats wrong?

Nic
  • 13,287
  • 7
  • 40
  • 42
James
  • 42,081
  • 53
  • 136
  • 161
  • possible duplicate of [php regexp: remove all attributes from an html tag](http://stackoverflow.com/questions/3026096/php-regexp-remove-all-attributes-from-an-html-tag) – Gordon Oct 25 '11 at 12:52
  • What's with the editing history here? -- Your first problem might be the choice or lack of regex delimiters (see the manual). – mario Oct 25 '11 at 12:55
  • editing is quite messy, because of a good beer. sry – James Oct 25 '11 at 12:58

4 Answers4

16

Try this instead:

$data = preg_replace('#\s(id|class)="[^"]+"#', '', $data);

Note: We solved the remaining issues in chat. The answer still fits the problem described in the question.

Till Helge
  • 9,253
  • 2
  • 40
  • 56
  • At that point I probably should tell you that using regex to parse HTML is a very bad idea. Consider `
  • ` for example. There are so many different possibilities for where `id` and `class` could be located within `
  • ` that writing a regex for that is unnecessarily tedious. You would have to consider all possible combinations of `class`, `id` and any other attribute that is allowed within `
  • `.
  • – Till Helge Oct 25 '11 at 13:06
  • there are only class and id inside li – James Oct 25 '11 at 13:06
  • Well...if that is the case, you could replace every `
  • ` with `
  • `: `preg_replace('#
  • ]+>#', '
  • ', $data);`
  • – Till Helge Oct 25 '11 at 13:08