1

I am trying to track PDFs on my site via Google analytics. Using the find and replace in Dreamweaver I need get a regex to find all PDF URLs and add the file name of the PDF onto the end. So:

http://mysite/strategy/annual-plan-16-17.pdf

becomes

http://mysite/strategy/annual-plan-16-17.pdf?pdf=annual-plan-16-17

Unfortunately, though I am learnign REGEX I have not reached this level of sophistication yet, so would be grateful for any suggestions. Thanks ever so much.

2 Answers2

2

I think you can use

https?://\S*/([^/]+)\.pdf

and replace with $0?pdf=$1.

See the regex demo.

Details:

  • https?:// - http:// or https://
  • \S* - zero or more non-whitespace symbols, as many as possible up to the last
  • / - slash
  • ([^/]+) - (Group 1) one or more chars other than /
  • \.pdf - a literal .pdf.

If you need to only grab those links with no ? after .pdf, append (?!\?) negative lookahead at the end of the pattern.

In the replacement pattern, the $0 inserts the whole match text and $1 inserts only the contents captured into Group 1.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

This will work well on names separated by '-' and the either any amount of numbers and letters separated by '-', inducing none. It will give you back the letter and numberParts parts with ?pdf= appended to the end of the URL.

var value = "http://mysite/strategy/annual-plan-16-17.pdf";
var matches = value.replace(/([^\/\s]+)(.pdf)/g, "$1$2?pdf=$1");
console.log(matches)

This is done by splitting the matching into 2 groups with (), these are:

  1. the first one takes the names separated by dashes using [^/\s.]+ to get any number of any character not a '\', '.',or a white-space, this basically gets all the characters from the '.pdf' to the / before it.

  2. This next group matches .pdf using .pdf (obviously) It then replaces this match with the whole match plus + ?pdf= + the first group.

If you wish jsut the letter part so xxxxx-xxxxx-1111.pdf gose to .pdf?pdf=xxxxx-xxxxx, then you can use this.

var value = "http://mysite/strategy/annual-plan-16-17.pdf";
var matches = value.replace(/([^\/0-9]*[^-\/0-9])(-??[^/.]*)(.pdf)/g, "$1$2$3?pdf=$1");
console.log(matches)

This is done by splitting the matching into 3 groups with (), these are:

  1. the first one takes the names separated by dashes using [^./0-9]* to get any number of any character not a '.', '/' or a digit, it then uses [^-./0-9] to make sure the match doesn't end with a '-'. This effectively matches words separated by - that don't contain numbers.

  2. This group uses -?? to match as few - as possible (including none), it then follows with [^/]* which matches anything that doesn't have a '^' or a '/'. This effectively matches words separated by - that contain numbers.

  3. uses .pdf to match .pdf. If you wanted to ensure it was at the end of a string you could use .pdf$

This match is then replaced with itself + ?pdf= + the first matching group.

milo.farrell
  • 662
  • 6
  • 19
  • I am so grateful how clearly you have set out your reply it is absolutely brilliant as it becomes a learning resource I can use for the future, in the hopes that i will truly get to grips with REGEX like you have. I am really very grateful. – user3517217 Oct 20 '16 at 12:50
  • I'm glad i was helpful. If you don't already you should look at regex testers, i like this one https://regex101.com/. They can make regular expressions much easier to work out and experiment with. – milo.farrell Oct 20 '16 at 15:53