0

I need a way to select "h1" everything after "h1" to replace it to nothing using regular expressions. I also need it to work for @import.

I need to change this:

<link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
h1 { font-family: 'Special Elite', arial, serif; }
@import url(http://fonts.googleapis.com/css?family=Special+Elite);
<link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
h1 { font-family: 'Quattrocento Sans', arial, serif; }
@import url(http://fonts.googleapis.com/css?family=Quattrocento+Sans);
<link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>
h1 { font-family: 'Smythe', arial, serif; }
@import url(http://fonts.googleapis.com/css?family=Smythe);

To this:

<link href='http://fonts.googleapis.com/css?family=Special+Elite' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Quattrocento+Sans' rel='stylesheet' type='text/css'>
<link href='http://fonts.googleapis.com/css?family=Smythe' rel='stylesheet' type='text/css'>
cottontail
  • 10,268
  • 18
  • 50
  • 51
ThomasReggi
  • 55,053
  • 85
  • 237
  • 424
  • in which programming/scripting language? – drudge Apr 26 '11 at 17:06
  • Please see http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – rerun Apr 26 '11 at 17:07
  • 1
    I'm just using search and replace in my text editor. – ThomasReggi Apr 26 '11 at 17:10
  • @rerun: -1 to you for mindless parroting. Regexes are just fine for most specific HTML; they are just tricky on general HTML. If he has specific cases, there is nothing wrong with it. – tchrist Apr 26 '11 at 18:19

1 Answers1

1

**This one should match on the lines you want to keep:

(<link.*css'>)

And this one should match on the lines you want to delete:

(h1 {.*})|(@import.*;)
cottontail
  • 10,268
  • 18
  • 50
  • 51
drudge
  • 35,471
  • 7
  • 34
  • 45
  • 1
    I don't understand what the big deal is, I just have a 300 line document with a list of HTML data. Why can't we just pretend this is a string? – ThomasReggi Apr 26 '11 at 17:17
  • @Thomas: HTML is not a Regular language, so using **Regular** Expressions to match it is highly susceptible to breaking. – drudge Apr 26 '11 at 17:20
  • No no no! The patterns used in modern text-processing **ARE NOT REGULAR** so they certainly can be used on stuff like this. It’s just nobody stopped calling them regular expressions once they became non-textbook-regular, like with `(.*)\1`, for example. It’s just tricky in the general case is all. It is usually fairly easy in the specific case, so it is just fine to use them. – tchrist Apr 26 '11 at 18:18