Multilingual website and bot detection

Question

I have a website where I implement multilingual.

I divide my languages per subdomains.

fr-fr.mywebsite.com
es-es.mywebsite.com
www.mywebsite.com // root domain => neutral language for bots

On the subdomains, if a language cookie was not set, I use the subdomain as language code.

On the primary domain (www), if a language cookie was not set, then :

if it's a bot, I use neutral language
if it's not a bot, I detect the user language using the "accept-language" header.

How to detect safely if it is a robot? I read old topics on the matter but people simply used the "accept-language" because bots didn't send this header, however, to date, google sends this header...

Is it safer to detect if it's a bot, or inverse, to detect if it's a web browser? Because if the bot is not detected, it's the website that will be indexed in wrong language.

Ideas ?

Why not use language annotations? in that way the bot will find the alternate languages pages — Cesar, Sep 22 '16 at 16:55
I use them too. But the primary domain have to auto detect user language :) — Ndrou, Sep 23 '16 at 07:07
Hi @Ndrou, I still not understand why you need to find if the user is a bot, if the request has a valid "accept-language" header you can send it to the proper language site, and if not, to your main or default language site, if is a bot, he will be able to find all the altenate languages using the language annotations and index them too — Cesar, Sep 23 '16 at 16:10

score 1 · Answer 1 · edited May 23 '17 at 12:33

1

Assuming you're using PhP, you can request the HTTP_USER_AGENTand see if the user agent is 'googlebot'.

if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
    // what to do
}

Here's the link to a question (and the example which I pulled from it).

how to detect search engine bots with php?

edited May 23 '17 at 12:33

Community

1
1

answered Sep 22 '16 at 16:52

Matthew

3,136
3
18
34

Yes, but there is not only googlebot, there are many bots as yahoo, bing, yandex, etc... How to be sure not to forget one ? – Ndrou Sep 23 '16 at 07:06
You can add in all those bots name, just google and look up the bot name. You can also add a log and just log the `HTTP_USER_AGENT` value and sort through the list to see if a bot-like name has popped up. All the well-known search engines and legitimate will name their bot. – Matthew Sep 26 '16 at 01:44

Multilingual website and bot detection

1 Answers1