1

I'm new to regular expressions in php.

I have a long string of html. I want to find all occurences of:

@any_username_after_an_at_sign

Could somebody help me recover all of the usernames on a page? I think you use preg_match but I don't know the regular expression to use.

Thanks!!

chris
  • 20,791
  • 29
  • 77
  • 90
  • I think your description is so vague that the answers might not be useful to you. What is the context around the string you want to match? Could there be email addresses and other false matches in the same document? Could you post an extract of the HTML showing what you want to match? – Mark Byers Feb 20 '10 at 20:09

4 Answers4

1

You could try:

/@\w+/

But this might pick up some false matches, such as parts of email addresses. Can you tell us something about the context?

It might also be relevant to consider using an HTML parser, although without more information it is hard to be sure.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • The context is actually a twitter-like microblogging profile page with status updates. So it's like searching thorugh www.twitter.com/ev/ – chris Feb 20 '10 at 20:14
  • @chris: Then it definitely sounds like you ought to be using a parser for this and not regex. Chances are that there is some markup telling you what the username is and what the message is. If you can parse that markup then you can get the username more reliably than with a regex. – Mark Byers Feb 20 '10 at 20:24
  • @chris, you should add that to the question, it's a very important piece of information – John La Rooy Feb 20 '10 at 21:07
  • @chris: From this thread: http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php you can use DOMDocument::loadHTML http://docs.php.net/manual/en/domdocument.loadhtml.php – Mark Byers Feb 20 '10 at 21:14
1

Simple:

preg_match_all('~@(\w+)\b~', '@me @you', $usernames);
print_r($usernames);

Result:

Array (
  [0] => Array(
    [0] => @me
    [1] => @you
  )
  [1] => Array (
    [0] => me
    [1] => you
  )
)

Once you get this, simply match these against your users' DB table to weed out false positives. You might also want to strip_tags() before you do this to avoid getting text from inside attributes.

Max Shawabkeh
  • 37,799
  • 10
  • 82
  • 91
0

Try this:

@\S+

and use preg_match_all

AntonioCS
  • 8,335
  • 18
  • 63
  • 92
0

Given the context of the twitter page, something like this may work.

'@<a class="tweet-url username"[^>]*>([^<]*)</a>'

but a proper parser will always work better than a regex for this type of problem

John La Rooy
  • 295,403
  • 53
  • 369
  • 502