0

I need to specify the string find in Regex format, in order that head tag can be found whatever its format is like <html > or <html> or < html>. How to specify the find string in Regex format?

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
String find = "<html>";
String replace = "";        
Pattern pattern = Pattern.compile(find);        
Matcher matcher = pattern.matcher(source);        
String output = matcher.replaceAll(replace); 
System.out.println("Source = " + source);
System.out.println("Output = " + output);
kiritsuku
  • 52,967
  • 18
  • 114
  • 136
user1321824
  • 445
  • 3
  • 22
  • 41

4 Answers4

3

Although you could go round your problem by doing <\\s*html\\s*>, you should not process HTML with regex. Obligatory link.

The \\s* denotes 0 or more white spaces.

npinti
  • 51,780
  • 5
  • 72
  • 96
1

Do not attempt to parse HTML using regex! Try reading about XPath. Very helpful. Although XPath will try by default to validate your document, but you can try HtmlCleaner to make it valid.

Andrei Sfat
  • 8,440
  • 5
  • 49
  • 69
0

To extract text inside your tags use something like

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";
System.out.println( source.replaceAll( "^<\\s*html\\s*>(.*)<\\s*\\/html\\s*>$", "$1" ) );
// output is:
// The quick brown fox jumps over the brown lazy dog.

But try to avoid parsing of html by regexps. Read this topic.

Community
  • 1
  • 1
stemm
  • 5,960
  • 2
  • 34
  • 64
0

This example may be helpful to you.

String source = "<html >The quick brown fox jumps over the brown lazy dog.</html >";

        String find = "\\<.*?>";
        String replace = "";        
        Pattern pattern = Pattern.compile(find);        
        Matcher matcher = pattern.matcher(source);        
        String output = matcher.replaceAll(replace); 
        System.out.println("Source = " + source);
        System.out.println("Output = " + output);
Biswajit
  • 2,434
  • 2
  • 28
  • 35