regular expression for matching correct string

Question

i have a string:

Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>

It is all in a single line, so how would I extract only the information about the balls, ie output should be . . . . . . 3 . . 1b 4 . 1 1 1 . . 4 . . . 4 . .

The closest i got was with [^(Recent overs|<b>|<tt>|</b>|</tt>|</p>)]+, but it matches the 1 and not 1b.

Balls? What balls? What does that have to do with your question? — Justin Morgan - On strike, Aug 02 '11 at 19:26
What Regex engine or language do you use? Also, within character class the alternation have no meaning... — nEAnnam, Aug 02 '11 at 19:28
Sample in ruby: `x = 'Recent overs . . . . . . | 3 . . 1b 4 .| 1 1 1 . . 4 | . . . 4 . .'; result = x.gsub(/<[^>]+>/, '').gsub('|', '').match(/\..*\./)[0]` — taro, Aug 02 '11 at 19:33

Vlad · Answer 1 · 2011-08-02T19:36:05.543

0

Try \s[\d\.][\w]* to match all digit (possibly followed by word) characters or points preceeded by a space!

edited Aug 02 '11 at 19:36

answered Aug 02 '11 at 19:30

Vlad

10,602
2
36
38

The first group of `.`'s doesn't have a space before it. – Justin Morgan - On strike Aug 02 '11 at 19:35
Also, this will match `overs` in `Recent overs`. – Justin Morgan - On strike Aug 02 '11 at 19:36
But in either case, each data point will be in a separate match; i.e. each `.` will be a single match, each `1` will be a single match...each match will consist of one character, except for `1b`. He seems to want them grouped together according to which tag pair they're in. – Justin Morgan - On strike Aug 02 '11 at 19:42
@Justin: According to the regex he provided he can live with separate matches. – Vlad Aug 02 '11 at 19:45

score 0 · Answer 2 · edited May 23 '17 at 12:19

Based solely on the example you gave, you could try something like:

/(?<>)[a-z\d\s\.]+/g

Alternative, in case your regex engine doesn't support lookbehinds:

/>([a-z\d\s\.]+)/g     #Matches will be in the first capture group.

However, it's a little hard to infer the rules of what should/should not be allowed based on the small sample you gave, and your output sample doesn't make much sense to me as a data structure. It seems like you might be better off using an HTML parser for this, since using regex to process HTML is frequently a bad idea.

score 0 · Accepted Answer · answered Aug 02 '11 at 19:33

First, the brackets [] are used for creating what is called a "character class" - this is meant to represent a single character. Your code effectively says don't match these characters: (Recntovrsbp|<>/

You'd be better off using a regex to remove the unwanted strings, then it's easier to parse the result, like this:

Javascript, because you didn't specify the language

var s = "Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>";
s = s.replace(/(Recent overs|<[^>]+>|\|)/ig, '');

jsfiddle example

The resulting 's' is much easier to parse.

regular expression for matching correct string

3 Answers3