-1

i have a string which includes links of this pattern:

<a href="http://randomurl.com/random_string;url=http://anotherrandomurl.com/">xxxx</a>

i want to remove "http://xxx.xxx.xxx/random_string;url=" and keep the rest of the string, leaving at the end

<a href="http://anotherrandomurl.com/">xxxx</a>

Can anyone help please ?

user628736
  • 11
  • 1
  • 1
    Can you post what have your tried? – afuzzyllama Aug 25 '11 at 14:44
  • You have not posted any code or previous attempt, and we can't possibly entertain one question per need of a regular expression. If you can edit your question to show the regex you tried, it will gain answers that actually _help people understand regular expressions_. Please flag your question for moderator attention if you do. – Tim Post Aug 25 '11 at 15:49

4 Answers4

1

Use:

$new_link = preg_replace('/<a href="(?:.+);url=([^"]+)">/', '<a href="$1">', $url);
piotrp
  • 3,755
  • 1
  • 24
  • 26
1

There are multiple methods for achieving your desired result. An alternative from regex would be to find the occurence of url= using strpos and remove those characters and the preceeding characters as well.

Robert
  • 8,717
  • 2
  • 27
  • 34
1

This is trickier than you think, and I urge you to avoid using regex for it.

Instead, you should use an HTML parser to find all <a> tags in the document, then split their href attributes on ;url= and keep only the last part.

However, if you must use a regex, the following should work for most well-formed HTML:

preg_replace('/(<\s*a\s[^>]*href=)(["\'])(?:[^\1]*;url=)([^\1]*)(\1[^>]*>)/i', "$1$2$3$4", $url)

Explanation:

(<\s*a\s[^>]*\bhref=) # <a, optionally followed by other attributes, and then href. Whitespace is ignored. This will be captured in backreference $1.
(["\'])               # Either " or ' to enclose the href value. This will be captured in $2 for matching later.
(?:[^\1]*;url=)       # Any number of URLs followed by ";url=". This will be thrown out.
([^\1]*)              # This is the URL you want to keep. It will keep matching until the end of the quotes. This will be captured into $3.
(\1[^>]*>)            # The remainder of the <a> tag, including any other attributes. This is captured in $4.
Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
0
$new_link = preg_replace('~(\shref=")[^"]+?(?<=;url=)~', '$1', $url);
Geert
  • 1,804
  • 15
  • 15