1

I'm trying to select specific lines in an html document using regex. I'm looking to select lines starting with <link and that do NOT contain in the href the characters ../

Here's part of the html I have

<script type="text/javascript" src="/js/IP_Master_PT_RTL.master.js"></script>
<link rel="stylesheet" type="text/css" href="../global.design-editor.com/v8/main.min1024.css?v=_STAGING-Publisher_20180327.1" />
<link rel="stylesheet" type="text/css" href="../fonts.googleapis.com/earlyaccess/alefhebrew.css" />
<link rel="stylesheet" type="text/css" href="ad-systems368c.css?v=4701716615" />

This is what i did so far: <link[^\*]*?href=[^\*]*?"[^..]* I'm getting close but i would like to select the whole tag and not have the other tags selected.

I just started learning about regex so i'm fairly new to this.

  • 6
    *"I'm trying to select specific lines in an html document using regex."* [Don't](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454), it can't be done reliably. If you need to work with HTML, use an HTML parser. – T.J. Crowder Apr 24 '19 at 10:49
  • if you are doing this in browser using javascript, you can use dom apis to read all link elements and then do regex or string match for href check. sample: ```document.querySelectorAll('link').forEach((linkelement,i)=>{ console.log(linkelement.attributes['href']); })``` – gp. Apr 24 '19 at 10:52

3 Answers3

3

Using JavaScript in the browser, you can just look up all <link> elements and then filter them based on the href attribute

let allLinks = document.querySelectorAll('link');

let filteredLinks = Array.from(allLinks)
  .filter(el => el.getAttribute("href").startsWith("../") == false);

console.log(filteredLinks);
<script type="text/javascript" src="/js/IP_Master_PT_RTL.master.js"></script>
<link rel="stylesheet" type="text/css" href="../global.design-editor.com/v8/main.min1024.css?v=_STAGING-Publisher_20180327.1" />
<link rel="stylesheet" type="text/css" href="../fonts.googleapis.com/earlyaccess/alefhebrew.css" />
<link rel="stylesheet" type="text/css" href="ad-systems368c.css?v=4701716615" />
VLAZ
  • 26,331
  • 9
  • 49
  • 67
0
const links = [...document.getElementsByTagName('link')];
const newLinks = links.filter(link => !link.outerHTML.includes('../'))
console.log(newLinks)
mrblue
  • 237
  • 1
  • 12
  • 1
    `.includes` will catch that sequence anywhere in the string. In fact, anywhere in the entire HTML for that element. – VLAZ Apr 24 '19 at 11:18
  • Yeah, you are right. I didn't know about `getAttribute`. Your solution is much better – mrblue Apr 24 '19 at 11:22
-2

Maybe he/she is analysing files so he/she needs ragex. So here is my solution...

I am also new to ragex, but this might work for you.

<link\s*.*href\=\"[^(\.\.\/)].*\".*/>

also check this ragex builder, have fun..

buraksivrikaya
  • 102
  • 1
  • 8
  • 1
    It's r**e**gex - for Regular Expression. And this pattern will not match `href='../somevalue'` because of the quotes. The character class inside 1. doesn't need any escapes 2. will not match *only* `../` but also `.` or `./` or `/`. This can also match something you didn't intend at all, for example with the HTML `'
    '`
    – VLAZ Apr 24 '19 at 11:17