Manipulate regular String in Java

Question

I have a String text that has a regular form and want to take two parts of this String. the String has the format

"<html><div style=\"text-align:center;\"><b>****</b><br><i>Aula: </i><b>****</b></div></html>"

Where the ****indicates the parts of string that I want to take. How can I do? I'm using JAVA, also the string is written in HTML.

We can see that the intresting parts of the String are both limited by <b> and <\b>

use an Html parser like [jsoup](http://jsoup.org/) – nachokk Sep 13 '13 at 18:53 — nachokk, Sep 13 '13 at 18:53

Pshemo · Accepted Answer · 2013-09-13T19:24:14.833

5

If that is exactly form of your HTML String then you can use substring method using positions of <b> and </b> (if your HTML code can change you should use HTML parser)

String s = "<html><div style=\"text-align:center;\"><b>first</b><br><i>Aula: </i><b>second</b></div></html>";
int start = s.indexOf("<b>");
int end = s.indexOf("</b>");
String firstMatch = s.substring(start + "<b>".length(), end);

//now we can start looking for next `<b>` after position where we found `</b>`
start = s.indexOf("<b>", end);
//and look for </b> after position that we found latest <b>
end = s.indexOf("</b>", start);
String secondMatch = s.substring(start + "<b>".length(), end);

System.out.println(firstMatch);
System.out.println(secondMatch);

output:

first
second

edited Sep 13 '13 at 19:24

answered Sep 13 '13 at 18:55

Pshemo

122,468
25
185
269

Thanks, that will be good for the first interesting word. And how can I take the second? Even the second one, infact, bengins and ends with – Bernheart Sep 13 '13 at 19:00
@Bernheart Sorry I didn't notice that there are two parts that need to be extracted. Will edit. – Pshemo Sep 13 '13 at 19:02
Thank you for explain. That is what I wanted! – Bernheart Sep 13 '13 at 19:09
@Bernheart, You can also use `lastIndexOf()` for the second ``. A matter of taste though but just something you should read up on. You never know when it might come handy. – Ravi K Thapliyal Sep 13 '13 at 19:34

Daniel Kaplan · Answer 2 · 2013-09-13T19:11:34.263

4

You have a few options. The most obvious, but probably not the best, is to use a regex. Look at String.replaceAll for that.

A better option is to use an HTML parser. An example of that is JSoup.

edited Sep 13 '13 at 19:11

answered Sep 13 '13 at 18:52

Daniel Kaplan

62,768
50
234
356

You shouldn't use a regex to parse HTML. http://stackoverflow.com/a/1732454/1864167 – Jeroen Vannevel Sep 13 '13 at 19:00
You shouldn't be suggesting `replaceAll()` when OP clearly wants to parse data out of the string. I wonder if people have stopped reading answers before voting it up. – Ravi K Thapliyal Sep 13 '13 at 19:03
@RaviThapliyal no need to be rude. You can use `replaceAll` to do that. – Daniel Kaplan Sep 13 '13 at 19:04
@tieTYT, please add an illustration. It would help me as well. – Ravi K Thapliyal Sep 13 '13 at 19:07
@RaviThapliyal `System.out.println("
****
Aula: ****
".replaceAll("
", "").replaceAll("
", ""));` – Daniel Kaplan Sep 13 '13 at 19:12
1

@tieTYT, first of all there are two `` values that need to be parsed. Your solution leaves `****
Aula: ****` as output which is incorrect. Secondly, almost always you would parse something known out of something unknown. Your solution of passing a known header and footer is just plain hacky and impractical. – Ravi K Thapliyal Sep 13 '13 at 19:19
My mistake. Just add another `replaceAll("
Aula: ", "")` on the end. Yes it's hacky. That's why I said, "probably not the best" That's why I said a **better** option is to use JSoup. – Daniel Kaplan Sep 13 '13 at 20:10

Manipulate regular String in Java

2 Answers2