First, as comments point out, you can't parse HTML with a regex (thanks to Jeff Burka for linking to the canonical answer).
Second, since you are looking at a very limited and particular situation you can match using a capturing group to get the version.
Assuming that the div in question is not broken across lines, my strategy would be much like your posted attempt; look for the string softwareVersion and the tag close >
character, optional whitespace, the version string, optional whitespace, and the closing tag.
That gives a regex like softwareVersion[^>]*>\s*([0-9.]+)\s*</
From debuggex (which needs the .*
to match the leading part):
.*softwareVersion[^>]*>\s*([0-9.]+)\s*</

Debuggex Demo
This will give you the version in a capturing group, which will be matcher.group(1)
As a Java string, that's softwareVersion[^>]*>\\s*([0-9.]+)\\s*</
I omitted the div
after </
because, while it's in a div now, maybe it'll be a span or something else in the future.
I went simple with [0-9.]
so it can match 2.3
but also 3.0.1
, however it would also match ..382.1...33
— you could make one that matches a limited or arbitrary set of n(.n)*
dotted numbers if it was important.
softwareVersion[^>]*>\\s*([1-9][0-9]*(\\.[0-9]+){0,3})\\s*</
matches a version number n with zero to three .n point releases, so 3.0.2.1 but not 1.2.3.4.5