0

i have a string:

Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>

It is all in a single line, so how would I extract only the information about the balls, ie output should be . . . . . . 3 . . 1b 4 . 1 1 1 . . 4 . . . 4 . .

The closest i got was with [^(Recent overs|<b>|<tt>|</b>|</tt>|</p>)]+, but it matches the 1 and not 1b.

Nightfirecat
  • 11,432
  • 6
  • 35
  • 51
ravi
  • 3
  • 1

3 Answers3

0

Try \s[\d\.][\w]* to match all digit (possibly followed by word) characters or points preceeded by a space!

Vlad
  • 10,602
  • 2
  • 36
  • 38
0

Based solely on the example you gave, you could try something like:

/(?<>)[a-z\d\s\.]+/g

Alternative, in case your regex engine doesn't support lookbehinds:

/>([a-z\d\s\.]+)/g     #Matches will be in the first capture group.

However, it's a little hard to infer the rules of what should/should not be allowed based on the small sample you gave, and your output sample doesn't make much sense to me as a data structure. It seems like you might be better off using an HTML parser for this, since using regex to process HTML is frequently a bad idea.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
0

First, the brackets [] are used for creating what is called a "character class" - this is meant to represent a single character. Your code effectively says don't match these characters: (Recntovrsbp|<>/

You'd be better off using a regex to remove the unwanted strings, then it's easier to parse the result, like this:

Javascript, because you didn't specify the language

var s = "Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>";
s = s.replace(/(Recent overs|<[^>]+>|\|)/ig, '');

jsfiddle example

The resulting 's' is much easier to parse.

OverZealous
  • 39,252
  • 15
  • 98
  • 100