0

I am working on an automation for building Landingpages.

A copy/pastes from a word doc to a TinyMCE textarea which creates the in the output.

so if I copy/paste something like this:

This is my Website.

from a word doc - the output of it after sending the form will look like this:

This is my <a href="http://www.google.com">Website</a>.

I want to append to every link within an <a href> tag (only within an <a href> tag!) something like this:

?utm=foo_foo_foo

so it will look like this:

This is my <a href="http://www.google.com?utm=foo_foo_foo">Website</a>.

P.S: urls can end with "/" or without, this shouldn't matter, but should work both ways.

P.S2: TinyMCR adds the tags by itself (if you haven't noticed me mentioning it..,). I just need to append to a string that looks like this:

$string = "This is my <a href="http://www.google.com">Website</a>.";
Imnotapotato
  • 5,308
  • 13
  • 80
  • 147
  • 1
    any code? have you tried anything? Moreover, please, let us know that either you're generating these `href` and yielding page OR you've page and want to change all `anchor` tags? – Mubin Nov 01 '15 at 12:25
  • No code, I went through regex and preg_replace tutorials, everything is basic and not accurate to my needs. and I'm not sure I understand your second question. – Imnotapotato Nov 01 '15 at 12:29

1 Answers1

1

You should use a parser, not a regex for this.

$html = 'This is my <a href="http://www.google.com">Website</a>.';
$dom = new DOMDocument(); 
$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
foreach($links as $link) {
    $link->setAttribute('href', $link->getAttribute('href') . '?utm=foo_foo_foo');
}
echo $dom->saveHTML();

Output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>This is my <a href="http://www.google.com?utm=foo_foo_foo">Website</a>.</p></body></html>

If you had to use a regex you could do

$html = 'This is my <a href="http://www.google.com">Website</a>.';
echo preg_replace('~href=("|\')(.+?)\1~', 'href=$1$2?utm=foo_foo_foo$1', $html);

Output:

This is my <a href="http://www.google.com?utm=foo_foo_foo">Website</a>.

Both these approaches presume you never have a ? in the URL already..

chris85
  • 23,846
  • 7
  • 34
  • 51
  • Oooh. ok. I'm new to PHP, Can you give me a short explanation before I try it and start exploring "parers" ? Just guessing, Parsers pars texts/variables... so what difference them from regex? From what I know "Regex" searches for a pattern in a text and can change/add stuff to it. – Imnotapotato Nov 01 '15 at 12:38
  • I am free to use whatever I want in this code, tell me what you think is better and why, anyway I'm gonna continue researching both now. Learning everyday something new B|. – Imnotapotato Nov 01 '15 at 12:40
  • There's a longer write up on parsers here, http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php. Parsers are cleaner and have predefined functions, also if a failure occurs with a parser it won't destroy, with regex you have possibility to go very wrong. – chris85 Nov 01 '15 at 12:42
  • Yes, the parsers code looks more organized to be honest.. Thanks, I'll try this in a second and update you if it's working right. – Imnotapotato Nov 01 '15 at 12:43
  • Also removing HTML declaration with parser: http://stackoverflow.com/questions/4879946/how-to-savehtml-of-domdocument-without-html-wrapper. – chris85 Nov 01 '15 at 12:53
  • Hi! I now noticed, that when posting the texts to the new PHP file, I get the text surounded with: `

    abc

    `, any idea how to remove this?
    – Imnotapotato Nov 04 '15 at 12:43
  • 1
    See my previous last comment. – chris85 Nov 04 '15 at 12:46
  • Tnx, I'll take a look. – Imnotapotato Nov 04 '15 at 12:49