2

I wanted to create for my chrome extension a pattern which include all sites of Google (.com, .de, .fr, .en, ...) with a custom pattern. There is an exemple here :

https://www.google.com/*/exclude_all_of_the_following

http://www.google.co.ck/*/exclude_all_of_the_following

So I created a pattern, but it's not working :

*://*.google.???/*

But my pattern doesn't understand URL with 2 characters like ".de", I think it's working only for URL with 3 characters like ".com".

And I don't know how to exclude what I want to exclude.

I search on the google site match pattern but there is not all of the example what I want to do what I want.

Is there someone could help me? Just to clarify, I am not looking for a regex. I am looking for a match-pattern glob which you can find more information in the link I posted above.

Timtim
  • 324
  • 8
  • 18
  • What is your current code? – sharf May 03 '15 at 21:28
  • 1
    *and* don't forget about multiple TLDs, such as `.co.uk`... – newfurniturey May 03 '15 at 21:29
  • Here is my current pattern.. http://pastebin.com/2gRFY4Cf It's very huge and not optimized. – Timtim May 03 '15 at 21:30
  • And here is the list of the Google's supported domains what I use : https://www.google.com/supported_domains – Timtim May 03 '15 at 21:34
  • 1
    why not just google.*? – Rodrigo López May 03 '15 at 21:36
  • 1
    `'https://www.google.co.uk/hello/there?bye=now'.match('www.google[.a-z]*')` ? –  May 03 '15 at 21:36
  • @RodrigoLópez It's not working. I got an "Invalid value for 'content_scripts[3].matches[0]': Invalid host wildcard." error. – Timtim May 03 '15 at 21:37
  • 1
    I guess you could try `://www.google.*/*` or `://google.*/*`, but this is no regex, you are using *wildcards*, and that means the question is not related to regular expressions. – Wiktor Stribiżew May 03 '15 at 21:42
  • 1
    are you aiming at using the matched TLD (.com or .co.uk part) further in your code? – Ejaz May 03 '15 at 21:45
  • 2
    He is asking for a pattern to write into a Google extension, I don't know if it uses regexes or specific patterns for white/black list. I guess it's *-not-* to apply a chrome extension to a specific range of websites. He cannot use programmation directly like `string.match`. – Vadorequest May 03 '15 at 21:52
  • 1
    I took a look at the [match patterns](https://developer.chrome.com/extensions/match_patterns) documentation and I'm not sure if this is possible :( As @stribizhev mentioned in his comment above, you could try using the wildcards. It doesn't seem regexes or any other specific patterns are supported... I updated my answer below. – aug May 03 '15 at 22:00
  • Yeah right, now I think all your answer will be useful for next people who will have a similar problem. @aug Yeah, it's a bit limited.. Thanks Google. So I have an huge array of all existing domains. It really hurts.. Thank you for your (wasting) time :( ://google.*/* Isn't correct, I always have the same error with ".*" after "google" word. – Timtim May 03 '15 at 22:00
  • possible duplicate of [Match pattern for all google search pages](http://stackoverflow.com/questions/23747781/match-pattern-for-all-google-search-pages) – Xan May 03 '15 at 23:04

2 Answers2

4

After looking at the documentation for match patterns I'm not entirely sure if this is possible? It seems the different patterns you are allowed to use are very limited. :( if anyone finds out more please post.


Answer with Regex (not what OP is looking for)

Unfortunately for languages you are just going to have to account for the different possible languages you might have. You could make a generic regex, but languages that aren't supported will go through. If someone has a better solution for this, please post! Here is what I have just whipped up.

/http(s?):\/\/(www?).google.(com|ad|ae|com.af|com.ag|com.ai|al|am|co.ao|com.ar|as|at|com.au|az|ba|com.bd|be|bf|bg|com.bh|bi|bj|com.bn|com.bo|com.br|bs|bt|co.bw|by|com.bz|ca|cd|cf|cg|ch|ci|co.ck|cl|cm|cn|com.co|co.cr|com.cu|cv|com.cy|cz|de|dj|dk|dm|com.do|dz|com.ec|ee|com.eg|es|com.et|fi|com.fj|fm|fr|ga|ge|gg|com.gh|com.gi|gl|gm|gp|gr|com.gt|gy|com.hk|hn|hr|ht|hu|co.id|ie|co.il|im|co.in|iq|is|it|je|com.jm|jo|co.jp|co.ke|com.kh|ki|kg|co.kr|com.kw|kz|la|com.lb|li|lk|co.ls|lt|lu|lv|com.ly|co.ma|md|me|mg|mk|ml|com.mm|mn|ms|com.mt|mu|mv|mw|com.mx|com.my|co.mz|com.na|com.nf|com.ng|com.ni|ne|nl|no|com.np|nr|nu|co.nz|com.om|com.pa|com.pe|com.pg|com.ph|com.pk|pl|pn|com.pr|ps|pt|com.py|com.qa|ro|ru|rw|com.sa|com.sb|sc|se|com.sg|sh|si|sk|com.sl|sn|so|sm|sr|st|com.sv|td|tg|co.th|com.tj|tk|tl|tm|tn|to|com.tr|tt|com.tw|co.tz|com.ua|co.ug|co.uk|com.uy|co.uz|com.vc|co.ve|vg|co.vi|com.vn|vu|ws|rs|co.za|co.zm|co.zw|cat)\/*/

In case you are wondering how I got all of them, I took a look at the link you posted (google.com/supported_domains), copied it into the console as a string and simply did .split(' .google.') which returned all of the languages in an array.

I then took the result of that array and did a reduce

splitLanguages.reduce(function(a,b) { return a + '|' + b; });

The resulting string I have put into that regex. Feel free to use .test to make sure it's working. If anyone has a better solution, please comment.

If you want a more generic regex, @keune has the right idea but like I said, languages that do not exist will go through and that may or may not be what you're after.

aug
  • 11,138
  • 9
  • 72
  • 93
  • 1
    Thank you for your answer, but I can't create functions in my manifest, I can only put string with the correct pattern. I see you got a downvote, but I want to thank each person who tries to help me so I upvoted. – Timtim May 03 '15 at 21:48
  • I see now. I'll leave this here but yeah you should not tag [tag:regex] unless you are looking for one. It seems someone has removed it for you. I'll leave this answer here anyways. – aug May 03 '15 at 21:54
  • 1
    Yeah I know, I didn't used regex tag, someone else added it. – Timtim May 03 '15 at 21:55
0

aug's answer is quite good. But it has huge mistakes. The periods (.) haven't been escaped. '?' should be outside the capturing group. And instead of the asterisk at the end, there should be '.*'. I tweaked the regular expression to this:

/http(s)?:\/\/(www\.)?google\.(com|ad|ae|com.af|com.ag|com.ai|al|am|co.ao|com.ar|as|at|com.au|az|ba|com.bd|be|bf|bg|com.bh|bi|bj|com.bn|com.bo|com.br|bs|bt|co.bw|by|com.bz|ca|cd|cf|cg|ch|ci|co.ck|cl|cm|cn|com.co|co.cr|com.cu|cv|com.cy|cz|de|dj|dk|dm|com.do|dz|com.ec|ee|com.eg|es|com.et|fi|com.fj|fm|fr|ga|ge|gg|com.gh|com.gi|gl|gm|gp|gr|com.gt|gy|com.hk|hn|hr|ht|hu|co.id|ie|co.il|im|co.in|iq|is|it|je|com.jm|jo|co.jp|co.ke|com.kh|ki|kg|co.kr|com.kw|kz|la|com.lb|li|lk|co.ls|lt|lu|lv|com.ly|co.ma|md|me|mg|mk|ml|com.mm|mn|ms|com.mt|mu|mv|mw|com.mx|com.my|co.mz|com.na|com.nf|com.ng|com.ni|ne|nl|no|com.np|nr|nu|co.nz|com.om|com.pa|com.pe|com.pg|com.ph|com.pk|pl|pn|com.pr|ps|pt|com.py|com.qa|ro|ru|rw|com.sa|com.sb|sc|se|com.sg|sh|si|sk|com.sl|sn|so|sm|sr|st|com.sv|td|tg|co.th|com.tj|tk|tl|tm|tn|to|com.tr|tt|com.tw|co.tz|com.ua|co.ug|co.uk|com.uy|co.uz|com.vc|co.ve|vg|co.vi|com.vn|vu|ws|rs|co.za|co.zm|co.zw|cat)\/.*/

This code works for all Google URLs as far as I know. I could not comment this on the answer as my reputation is below 50.

Community
  • 1
  • 1
iamMG
  • 41
  • 1
  • 7