-2

I need to check each line of my html files whether it has js/css/jpg file included. If so, I am going to do something more on these lines. For example:

<img src="logo.jpg" />
<script src="head.js"></script> //double quotes
<script src='head.js'></script> //single quote
<link rel="stylesheet" type="text/css" href="mystyle.css">

These are all the cases matche the checking rule. But <script src="head.json"></script> will not be matched because it doesn't exactly match the keyword "js".

I am writing a Java application to scan the html source and want to design a regular expression for the checking. Basically I think it needs to check whether each line contains .js"/', .css"/' and .jpg"/'.

In future there might be more keywords to add. How to write the regular expression elegantly?

jm li
  • 303
  • 2
  • 9
  • 18

1 Answers1

1

This is merely an answer to the question : How to match js but not json?

You can use word boundaries :

\b(js|jpg|css)\b # or
\.(js|jpg|css)\b

Example here.

If you want to parse HTML with Java, use jsoup.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124