-5

Hello I want to extract "Hello, World!" "and" and the Paragraph "This is a minimal....." from the given string in JAVA. I am having problems in extracting, so can anyone help me with it?

So I always get different Strings and want to extract the string between 2 square brackets []......[].

String s1="[sh1] Hello, World! [/s11] and [pp]This is a minimal "hello world" HTML document. It demonstrates the basic structure of an HTML file and anchors. [/xy]"

Thanks

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
Arnav
  • 51
  • 10
  • 1
    Do you have any code you would like to share? – martin Apr 18 '15 at 07:50
  • I am getting all the HTML source using GET Request and want to remover tags from the string and then I only want to print whatever is there in the body i.e. string in between tags. So I separated the Body content using String s1=s.substring(s.indexOf("")+6,s.indexOf("")); Now I further more want to remove all tags and just print the String in between them. – Arnav Apr 18 '15 at 08:15

2 Answers2

1

Use the Pattern & Matcher to match square brackets:

Pattern pattern = Pattern.compile("\\[[^\\]]*\\]([^\\]]*)\\[[^\\]]*\\]");
Matcher matcher = pattern.matcher(s1);
while (matcher.find()) {
  System.out.println( "Found value: " + matcher.group(1).trim() );
}

Demo: https://ideone.com/kNKBgg

Nagarjun
  • 2,346
  • 19
  • 28
  • Thanks for the help, but it is giving 2 errors Clientonly.java:82: error: illegal escape character Pattern pattern = Pattern.compile("](.*?)["); ^ Clientonly.java:82: error: illegal escape character Pattern pattern = Pattern.compile("](.*?)["); ^ – Arnav Apr 18 '15 at 08:00
  • Thanks for the help Nagarjun but they ain't working. In the question actually I want String between <>.... <> 2 square brackets but while typing question when I put <> it takes it to be a HTML code and hence does not print <> and that is why I had to print []. – Arnav Apr 18 '15 at 08:21
  • I am getting all the HTML source using GET Request and want to remover tags from the string and then I only want to print whatever is there in the body i.e. string in between tags. So I separated the Body content using String s1=s.substring(s.indexOf("")+6,s.indexOf("")); Now I further more want to remove all tags and just print the String in between them. – Arnav Apr 18 '15 at 08:21
  • @Arnav As Ana mentioned in below answer, you should use DOM or SAX parser if you are parsing HTML document. My solution can be modified for matching HTML as well but it is not recommended way. – Nagarjun Apr 18 '15 at 08:39
  • Hi Nagarjun, can you help me with /\]*\>([^]*)\<\/body/ Pattern and Matcher? it gives an error when I try to put in Pattern.compile("/\]*\>([^]*)\<\/body/") – Arnav Apr 19 '15 at 08:26
0

Please don't use RegEx-es to do this (it's what Pattern and Matcher do) - see here for reason why you shouldn't. While you could use this for the particular bracket example, if you expect full-blown HTML don't do it.

If you want to extract content from HTML use a parser, for example SAXParser or DOMParser - see Oracle documentation for examples.

Community
  • 1
  • 1
Ana Vinatoru
  • 181
  • 1
  • 7