6

I need to remove style tags from text file..

I tried the following code

String text = readFile("E:/textwithstyletags.txt");
retVal = text.replaceAll("<style(.+?)</style>", "");

it works when the text file has style tags without new lines i.e. <style> body{ color:red; } </style>

It doesn't work when there are new lines, like this

<style> 
body{ 
color:red; 
} 
</style>
Mazdak
  • 105,000
  • 18
  • 159
  • 188
Mark Timothy
  • 1,784
  • 5
  • 19
  • 30
  • Possibly related? [Match multiline text using regular expression](http://stackoverflow.com/questions/3651725/match-multiline-text-using-regular-expression) – PakkuDon Apr 27 '15 at 06:05

4 Answers4

8

You can use [\s\S] in place of . in your regex

i.e:

retVal = text.replaceAll("<style([\\s\\S]+?)</style>", "");
karthik manchala
  • 13,492
  • 1
  • 31
  • 55
7

Tested on regex101.

Pattern:

<style((.|\n|\r)*?)<\/style>    

Your code:

String text = readFile("E:/textwithstyletags.txt");
retVal = text.replaceAll("<style((.|\\n|\\r)*?)<\\/style>", "");
Jared Rummler
  • 37,824
  • 19
  • 133
  • 148
3

Try this regex:

retVal  = text.replaceAll("(?i)<style.*?>.*?</style>", "");

On a side note you can look at JSoup which is a java library made for HTML manipulation.

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • I like this, handles everything non-greedy and case insensitive, no muss no fuss. Although I think you need to add the s flag (dot matches all) to handle linefeeds between open & close tags. like: `retVal = text.replaceAll("(?is).*?", "");` – bmiller May 08 '20 at 21:08
2

you can use

this expression <style[\\w\\W]+?</style>

retVal = text.replaceAll("<style[\\w\\W]+?</style>", "");

It says to find all the alphanumeric character including the underscore(\w) and not word (\W) character till </script>

karthik manchala
  • 13,492
  • 1
  • 31
  • 55
Saif
  • 6,804
  • 8
  • 40
  • 61