0

I'm collecting some website meta-data. Some websites provide a local version based on my location, can I avoid this?

Here's what I'm currently doing:

import requests
from bs4 import BeautifulSoup

source = requests.get('http://www.youtube.com').text
source_soup = BeautifulSoup(source, 'lxml')
current_description = source_soup.find_all('meta', attrs={'name': 'description'})
print(current_description)

The result is get is:

[<meta content="Auf YouTube findest du großartige Videos und erstklassige Musik. Außerdem kannst du eigene Inhalte hochladen und mit Freunden oder mit der ganzen Welt teilen." name="description"/>]

This is what I want, but for the German version of the website. I'd like to have the English version to avoid dealing with different languages if at all possible. Since I want to scrape many different websites, I practically can't manually change the URLs to force English language or anything like that.

Is there a solution with the request module? My only other idea is to use a VPN, but that seems cumbersome.

ahanf
  • 59
  • 1
  • 7
  • Solved using proxies keyword argument of request.get function, see https://stackoverflow.com/questions/8287628/proxies-with-python-requests-module – ahanf Jun 13 '19 at 10:51

1 Answers1

1

You can add it in the headers parameter:

import requests
from bs4 import BeautifulSoup

headers = {'accept-language': 'en-US,en;q=0.9,en-GB;q=0.8'}

source = requests.get('http://www.youtube.com' ,headers=headers).text
source_soup = BeautifulSoup(source, 'lxml')
current_description = source_soup.find_all('meta', attrs={'name': 'description'})
print(current_description)

Notice if I change it to de, I will get German.

import requests
from bs4 import BeautifulSoup

headers = {'accept-language': 'de'}

source = requests.get('http://www.youtube.com' ,headers=headers).text
source_soup = BeautifulSoup(source, 'lxml')
current_description = source_soup.find_all('meta', attrs={'name': 'description'})
print(current_description)

Output:

[<meta content="Auf YouTube findest du großartige Videos und erstklassige Musik. Außerdem kannst du eigene Inhalte hochladen und mit Freunden oder mit der ganzen Welt teilen." name="description"/>]

fr gives me french:

import requests
from bs4 import BeautifulSoup

headers = {'accept-language': 'fr'}

source = requests.get('http://www.youtube.com' ,headers=headers).text
source_soup = BeautifulSoup(source, 'lxml')
current_description = source_soup.find_all('meta', attrs={'name': 'description'})
print(current_description)

Output:

[<meta content="Profitez des vidéos et de la musique que vous aimez, mettez en ligne des contenus originaux, et partagez-les avec vos amis, vos proches et le monde entier." name="description"/>]
chitown88
  • 27,527
  • 4
  • 30
  • 59