0

How could I optimize these Regex searches? Currently they take up to 5 seconds on my mobile phone

  • Conversation: <div class="field-items">.+?sms-notregion
  • Place: (?<=de/ort/)[^"]+
  • ID: (?<=sms-share-id sms-tagline-elem">#)\d+
  • Single message: sms-participant sms-participant-.+?</div></div>
  • Participant: (?<=sms-participant-)\d
  • Time: (?<=sms-tag">)\d+:\d+
  • messagetext: (?<=sms-bubble">).+?(?=</div>)

I first search for conversations, then for the single messages in them and so on.

For example I have this website I am matching with: http://pastebin.com/uun0uKL1

Update. As it turned out, my regex wasn't the slow part of my code, but the use of Html.fromhtml(), that I was trying to use in order to unescape html special chars.

joz
  • 652
  • 2
  • 8
  • 19

1 Answers1

3

Don't parse HTML with RegExes. Just don't.

Instead, I've found a nice Java library called jsoup which can quickly parse HTML.

Here's an example of using jsoup with what you want to get:

Document doc = Jsoup.connect("http://example.com/").get();
Elements elements = doc.select("div.sms-tag");
// Then iterate over those elements
for (Element element : elements) {
    String time = element.text();
}

And such stuff. Looking at their "cookbook" might help, too.

Community
  • 1
  • 1
hichris123
  • 10,145
  • 15
  • 56
  • 70
  • thanks for the idea ... but as it turned out, this solution took longer than my actual regex operations (see edit description) – joz Oct 19 '14 at 22:17