0

I want to find the second <BR> tag and to start the search from there. How can i do it using regular expressions?

<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>

alt text

alt text

Community
  • 1
  • 1
uzay95
  • 16,052
  • 31
  • 116
  • 182

4 Answers4

1

Prepend <BR>[^<]*(?=<BR>) to your regex, or remove the lookahead part if you want to start after the second <BR>, such as: <BR>[^<]*<BR>.

Find text after the second <BR> but before the third: <BR>[^<]*<BR>([^<]*)<BR>

This finds "waldo" in <BR>404<BR>waldo<BR>.

Note: I specifically used the above instead of the non-greedy .*? because once the above starts not working for you, you should stop parsing HTML with regex, and .*? will hide when that happens. However, the non-greedy quantifier is also not as well-supported, and you can always change to that if you want.

Community
  • 1
  • 1
  • Note that `
    [^<]*
    ` is not the same as `
    .*?
    `.
    – Gumbo Jan 08 '10 at 08:17
  • Very good answer. Thank you but i want to ask 1 more question. This is very good >[^<]* generates this result '>like' . But i want to remove '>' tag from the result. So i just want to have 'like' result. How can i do this? – uzay95 Jan 08 '10 at 08:18
  • @Gumbo, but they have same result. – uzay95 Jan 08 '10 at 08:19
  • uzay95: I don't understand what you mean. –  Jan 08 '10 at 08:19
  • uzay95: No, they are different, and I believe you should use what I answered, for the stated reason. –  Jan 08 '10 at 08:20
  • @Roger Pate, first i've edited my first comment to express myself better so that i can get "like" word. And could you please tell why they are different? – uzay95 Jan 08 '10 at 08:31
  • uzay95: I still don't understand what you mean. Could you give example input. actual behavior, and desired behavior? --- They are different when you try to parse HTML, such as this input: `
    abc

    def
    ghi

    jkl
    `.
    –  Jan 08 '10 at 08:35
  • Look, this my target string:
    like
    Abdurrahman
    Fathers Name and when I write " >[^<]* " the result is equal to : '>like' As you can see, it includes an undesired character which is " > ". I don't want that. All I am asking is, where am I making a mistake? How can I get my code to just get the word "like" and nothing else.
    – uzay95 Jan 08 '10 at 08:42
  • To get "like " and nothing else from `
    like
    Abdurrahman
    Fathers Name`, use: `
    ([^<]*)
    `.
    –  Jan 08 '10 at 08:46
  • To get "Abdurrahman" from `
    like
    Abdurrahman
    Fathers Name`, use: `
    [^<]*
    ([^<]*)
    `.
    –  Jan 08 '10 at 08:46
  • Roger, I really thank you for your patient comments. I've just tried the code you suggested and it seems to return/highlight/include the
    and codes as well. So, I was trying to get rid of ">" character but now I have even more to get rid of. So unfortunately it didn't do what I wanted it to do. I apologize for repeating it again and again but isn't there a way to just highlight the word "like" ?
    – uzay95 Jan 08 '10 at 09:00
  • There's multiple levels of matching, your program is showing the complete matched text, while you're interested in the first group here (the part between the parentheses); get your program to show you the difference between those. –  Jan 08 '10 at 09:32
0

The usual solution to this sort of problem is to use a "capturing group". Most regular expression systems allow you to extract not only the entire matching sequence, but also sub-matches within it. This is done by grouping a part of the expression within ( and ). For instance, if I use the following expression (this is in JavaScript; I'm not sure what language you want to be working in, but the basic idea works in most languages):

var string = "<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>";
var match = string.match(/<BR>.*?<BR>([a-zA-Z]*)/);

Then I can get either everything that matched using match[0], which is "<BR>like <BR>Abdurrahman", or I can get only the part inside the parentheses using match[1], which gives me "Abdurrahman".

Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • I'm not sure exactly what you are looking for. You might want to clarify your question. This shows you how to find two `
    ` tags, followed by whatever else you put in the parentheses. For instance, if you are looking for "Father", the search would be `
    .*?
    .*(Father)`, and the first substring match would refer to where it found `Father`. http://rubular.com/regexes/12836
    – Brian Campbell Jan 08 '10 at 08:06
0

assuming you are using PHP, you can split your string on <BR> using explode

$str='<BR>like <BR>Abdurrahman<BR><SMALL>Fathers Name</SMALL>';
$s = explode("<BR>",$str,3);
$string = end($s);
print $string;

output

$  php test.php
Abdurrahman<BR><SMALL>Fathers Name</SMALL>

you can then use "$string" variable and do whatever you want.

The steps above can be done with other languages as well by using the string splitting methods your prog language has.

ghostdog74
  • 327,991
  • 56
  • 259
  • 343
0

this regular expression should math the first two <br />s:

/(\s*<br\s*/?>\s*){2}/i

so you should either replace them with nothing or use preg_match or RegExp.prototype.match to extract the arguments.

In JavaScript:

var afterReplace = str.replace( /(\s*<br\s*\/?>\s*){2}/i, '' );

In PHP

$afterReplace = preg_replace( '/(\s*<br\s*\/?>\s*){2}/i', '', $str );

I'm only sure it'll work in PHP / JavaScript, but it should work in everything...

Dan Beam
  • 3,632
  • 2
  • 23
  • 27
  • Would you tell me please what is the meaning of this reges '/(\s*
    \s*){2}/i' I just want to learn.
    – uzay95 Jan 08 '10 at 08:21
  • Dan: That won't match given input text of `
    anything here
    `, because you don't allow for anything but `\s` between the tags.
    –  Jan 08 '10 at 08:27
  • to explain /(\s*
    \s*){2}/i / # start regex ( # start group \s # whitespace * # any number of previous (inc. zero)
    # literal \s # whitespace * # zero or more of the previous ) # end group {2} # 2 of the group / # end regex i # match non-case sensitively (sorry my spacing is lost)
    – ternaryOperator Jan 08 '10 at 14:55