Optimizing Regex searches

Question

How could I optimize these Regex searches? Currently they take up to 5 seconds on my mobile phone

Conversation: <div class="field-items">.+?sms-notregion
Place: (?<=de/ort/)[^"]+
ID: (?<=sms-share-id sms-tagline-elem">#)\d+
Single message: sms-participant sms-participant-.+?</div></div>
Participant: (?<=sms-participant-)\d
Time: (?<=sms-tag">)\d+:\d+
messagetext: (?<=sms-bubble">).+?(?=</div>)

I first search for conversations, then for the single messages in them and so on.

For example I have this website I am matching with: http://pastebin.com/uun0uKL1

Update. As it turned out, my regex wasn't the slow part of my code, but the use of Html.fromhtml(), that I was trying to use in order to unescape html special chars.

score 3 · Accepted Answer · edited May 23 '17 at 10:28

3

Don't parse HTML with RegExes. Just don't.

Instead, I've found a nice Java library called jsoup which can quickly parse HTML.

Here's an example of using jsoup with what you want to get:

Document doc = Jsoup.connect("http://example.com/").get();
Elements elements = doc.select("div.sms-tag");
// Then iterate over those elements
for (Element element : elements) {
    String time = element.text();
}

And such stuff. Looking at their "cookbook" might help, too.

edited May 23 '17 at 10:28

Community

1
1

answered Oct 19 '14 at 16:40

hichris123

10,145
15
56
70

thanks for the idea ... but as it turned out, this solution took longer than my actual regex operations (see edit description) – joz Oct 19 '14 at 22:17

Optimizing Regex searches

1 Answers1