5

I'm developing an extension which will perform a certain action on all Google search URLs - but not on other websites or Google pages. In natural language the match pattern is:

  • Any protocol ('*://')
  • Any subdomain or none ('www' or '')
  • The domain string must equal 'google'
  • Any TLD including three-letter TLDs (e.g. '.com') and multi-part country TLDs (e.g. '.co.uk')
  • The first 8 letters of the path must equal '/search?'

Many people say 'to match all google search pages use "*://*.google.com/search?*" but this is patently untrue as it will not match national TLDs like google.co.uk.

Thus the following code does not work at all:

chrome.webRequest.onBeforeRequest.addListener(
  function(details) {
    alert('This never happens');
  }, {
    urls: [
        "*://*.google.*/search?*",
        "*://google.*/search?*",
    ],
    types: ["main_frame"]
  },
  ["blocking"]
);

Using "*://*.google.com/search?*" as the match pattern does work, but I fear I would need a list of every single Google localisation for that to be an effective strategy.

Xan
  • 74,770
  • 16
  • 179
  • 206
DMCoding
  • 1,167
  • 2
  • 15
  • 30
  • 4
    You can get that list as a plain text file from http://www.google.com/supported_domains It currently has 192 entries. – rsanchez May 20 '14 at 06:59
  • Good answer, but there seems to be an upper limit of a few dozen on the number of match patterns you can have. 192 entires is several times more match patterns than the listener will accept... :-( – DMCoding May 21 '14 at 14:53
  • 1
    Here's an interesting idea. Split your domain list and register several (identical) listeners. That would bypass the restriction. – Xan May 21 '14 at 16:48

3 Answers3

5

Unfortunately, match patterns do not allow wildcards for TLDs for security reasons.

You cannot use wildcard match patterns like http://google.*/* to match TLDs (like http://google.es and http://google.fr) due to the complexity of actually restricting such a match to only the desired domains.

For the example of http://google.*/*, the Google domains would be matched, but so would http://google.someotherdomain.com. Additionally, many sites do not own all of the TLDs for their domain. For an example, assume you want to use http://example.*/* to match http://example.com and http://example.es, but http://example.net is a hostile site. If your extension has a bug, the hostile site could potentially attack your extension in order to get access to your extension's increased privileges.

You should explicitly enumerate the TLDs that you wish to run your extension on.

A slightly unrealistic option would be to list all variants with all national TLDs.

Edit: thanks to an incredibly helpful comment by rsanchez, here's an up to date list of all Google domain variants which makes this approach viable.

A realistic option is to inject into a larger set of pages (for instance, all pages), then analyze the URL (with a regexp, for example) and only execute if it matches the pattern you are looking for. Yes, it will be a scarier permissions warning, and you will have to explain it to your users.

Community
  • 1
  • 1
Xan
  • 74,770
  • 16
  • 179
  • 206
  • Explicitly enumerating all national TLDs is not slightly unrealistic; it's impossible as there is an upper limit of a few dozen to the number of match patterns the listener will accept. Thanks both for the good answers though! – DMCoding May 21 '14 at 14:54
  • 1
    Oops, I just noticed you're trying to block requests and not to inject scripts. I didn't really answer your question properly. For your case, you'll just have to implement filtering yourself, yes, in the listener handler. Inefficient, but you can at least salvage it with a match `"*://*/search?*"`, it will be vastly better than no filter at all. Though, you'll also need host permissions; maybe there is no such limitation there. – Xan May 21 '14 at 15:05
  • In the end I set the extension permissions and background page match pattern to "http://*/search?*" and "https://*/search?*". Then I used the URI.js library (which the extension was already using for other purposes) to manually ensure that the domain was like google.*. – DMCoding May 21 '14 at 15:14
  • Does that even work in permissions? They expect hosts. Also, you can use "*://" for protocol to avoid duplication. – Xan May 21 '14 at 15:17
  • Also, you do NOT want to test for "google.*" for the _exact same_ reasons as why it's not allowed in match patterns. Check against that domain list. – Xan May 21 '14 at 15:18
  • Yes I liked your original answer and my code specifically checks against that eventuality that you gave. I should have been more specific - using URI.js we first find the national TLD. Then we find the domain. If the domain+tld == URI host part (e.g. if google+.co.uk == google.co.uk) then we assume the site is authorised. Thus google+.com != the actual google.someattacker.com. You're right that a more elegant solution would use the domain list though. – DMCoding May 21 '14 at 15:44
4

Source: https://stackoverflow.com/a/16187588/6250024

I was wondering the same and found the same question with a better solution, which introduces the "include_globs" parameters.

"matches":        ["http://*/*", "https://*/*"],
"include_globs":  ["http://www.google.*/*", "https://www.google.*/*"],
Community
  • 1
  • 1
Dididi
  • 63
  • 6
  • here is the link to [Match patterns and globs](https://developer.chrome.com/docs/extensions/mv3/content_scripts/#matchAndGlob) – Gangula Jul 02 '23 at 17:58
1

You can use match-pattern arrays of arbitrary length (though it slows down the browser when using more than 1000 or so). For your convenience, here is a updated list:

  "matches": [
    "*://*.google.com/*",
    "*://*.google.ad/*",
    "*://*.google.ae/*",
    "*://*.google.com.af/*",
    "*://*.google.com.ag/*",
    "*://*.google.com.ai/*",
    "*://*.google.al/*",
    "*://*.google.am/*",
    "*://*.google.co.ao/*",
    "*://*.google.com.ar/*",
    "*://*.google.as/*",
    "*://*.google.at/*",
    "*://*.google.com.au/*",
    "*://*.google.az/*",
    "*://*.google.ba/*",
    "*://*.google.com.bd/*",
    "*://*.google.be/*",
    "*://*.google.bf/*",
    "*://*.google.bg/*",
    "*://*.google.com.bh/*",
    "*://*.google.bi/*",
    "*://*.google.bj/*",
    "*://*.google.com.bn/*",
    "*://*.google.com.bo/*",
    "*://*.google.com.br/*",
    "*://*.google.bs/*",
    "*://*.google.bt/*",
    "*://*.google.co.bw/*",
    "*://*.google.by/*",
    "*://*.google.com.bz/*",
    "*://*.google.ca/*",
    "*://*.google.cd/*",
    "*://*.google.cf/*",
    "*://*.google.cg/*",
    "*://*.google.ch/*",
    "*://*.google.ci/*",
    "*://*.google.co.ck/*",
    "*://*.google.cl/*",
    "*://*.google.cm/*",
    "*://*.google.cn/*",
    "*://*.google.com.co/*",
    "*://*.google.co.cr/*",
    "*://*.google.com.cu/*",
    "*://*.google.cv/*",
    "*://*.google.com.cy/*",
    "*://*.google.cz/*",
    "*://*.google.de/*",
    "*://*.google.dj/*",
    "*://*.google.dk/*",
    "*://*.google.dm/*",
    "*://*.google.com.do/*",
    "*://*.google.dz/*",
    "*://*.google.com.ec/*",
    "*://*.google.ee/*",
    "*://*.google.com.eg/*",
    "*://*.google.es/*",
    "*://*.google.com.et/*",
    "*://*.google.fi/*",
    "*://*.google.com.fj/*",
    "*://*.google.fm/*",
    "*://*.google.fr/*",
    "*://*.google.ga/*",
    "*://*.google.ge/*",
    "*://*.google.gg/*",
    "*://*.google.com.gh/*",
    "*://*.google.com.gi/*",
    "*://*.google.gl/*",
    "*://*.google.gm/*",
    "*://*.google.gp/*",
    "*://*.google.gr/*",
    "*://*.google.com.gt/*",
    "*://*.google.gy/*",
    "*://*.google.com.hk/*",
    "*://*.google.hn/*",
    "*://*.google.hr/*",
    "*://*.google.ht/*",
    "*://*.google.hu/*",
    "*://*.google.co.id/*",
    "*://*.google.ie/*",
    "*://*.google.co.il/*",
    "*://*.google.im/*",
    "*://*.google.co.in/*",
    "*://*.google.iq/*",
    "*://*.google.is/*",
    "*://*.google.it/*",
    "*://*.google.je/*",
    "*://*.google.com.jm/*",
    "*://*.google.jo/*",
    "*://*.google.co.jp/*",
    "*://*.google.co.ke/*",
    "*://*.google.com.kh/*",
    "*://*.google.ki/*",
    "*://*.google.kg/*",
    "*://*.google.co.kr/*",
    "*://*.google.com.kw/*",
    "*://*.google.kz/*",
    "*://*.google.la/*",
    "*://*.google.com.lb/*",
    "*://*.google.li/*",
    "*://*.google.lk/*",
    "*://*.google.co.ls/*",
    "*://*.google.lt/*",
    "*://*.google.lu/*",
    "*://*.google.lv/*",
    "*://*.google.com.ly/*",
    "*://*.google.co.ma/*",
    "*://*.google.md/*",
    "*://*.google.me/*",
    "*://*.google.mg/*",
    "*://*.google.mk/*",
    "*://*.google.ml/*",
    "*://*.google.com.mm/*",
    "*://*.google.mn/*",
    "*://*.google.ms/*",
    "*://*.google.com.mt/*",
    "*://*.google.mu/*",
    "*://*.google.mv/*",
    "*://*.google.mw/*",
    "*://*.google.com.mx/*",
    "*://*.google.com.my/*",
    "*://*.google.co.mz/*",
    "*://*.google.com.na/*",
    "*://*.google.com.nf/*",
    "*://*.google.com.ng/*",
    "*://*.google.com.ni/*",
    "*://*.google.ne/*",
    "*://*.google.nl/*",
    "*://*.google.no/*",
    "*://*.google.com.np/*",
    "*://*.google.nr/*",
    "*://*.google.nu/*",
    "*://*.google.co.nz/*",
    "*://*.google.com.om/*",
    "*://*.google.com.pa/*",
    "*://*.google.com.pe/*",
    "*://*.google.com.pg/*",
    "*://*.google.com.ph/*",
    "*://*.google.com.pk/*",
    "*://*.google.pl/*",
    "*://*.google.pn/*",
    "*://*.google.com.pr/*",
    "*://*.google.ps/*",
    "*://*.google.pt/*",
    "*://*.google.com.py/*",
    "*://*.google.com.qa/*",
    "*://*.google.ro/*",
    "*://*.google.ru/*",
    "*://*.google.rw/*",
    "*://*.google.com.sa/*",
    "*://*.google.com.sb/*",
    "*://*.google.sc/*",
    "*://*.google.se/*",
    "*://*.google.com.sg/*",
    "*://*.google.sh/*",
    "*://*.google.si/*",
    "*://*.google.sk/*",
    "*://*.google.com.sl/*",
    "*://*.google.sn/*",
    "*://*.google.so/*",
    "*://*.google.sm/*",
    "*://*.google.sr/*",
    "*://*.google.st/*",
    "*://*.google.com.sv/*",
    "*://*.google.td/*",
    "*://*.google.tg/*",
    "*://*.google.co.th/*",
    "*://*.google.com.tj/*",
    "*://*.google.tk/*",
    "*://*.google.tl/*",
    "*://*.google.tm/*",
    "*://*.google.tn/*",
    "*://*.google.to/*",
    "*://*.google.com.tr/*",
    "*://*.google.tt/*",
    "*://*.google.com.tw/*",
    "*://*.google.co.tz/*",
    "*://*.google.com.ua/*",
    "*://*.google.co.ug/*",
    "*://*.google.co.uk/*",
    "*://*.google.com.uy/*",
    "*://*.google.co.uz/*",
    "*://*.google.com.vc/*",
    "*://*.google.co.ve/*",
    "*://*.google.vg/*",
    "*://*.google.co.vi/*",
    "*://*.google.com.vn/*",
    "*://*.google.vu/*",
    "*://*.google.ws/*",
    "*://*.google.rs/*",
    "*://*.google.co.za/*",
    "*://*.google.co.zm/*",
    "*://*.google.co.zw/*",
    "*://*.google.cat/*"
  ],

To recreate, you can use the command

curl https://www.google.com/supported_domains | sed 's!\(.*\)!"*://*\1/*",!g'
serv-inc
  • 35,772
  • 9
  • 166
  • 188