0

I'm developping a scraper to get some data from youtube videos. I am from Spain and I am getting the songs that are present in a video, my code starts like this:

url = f'https://www.youtube.com/watch?v={vid}'
page = requests.get(url).text

The main problem is that later I compare the text in the page with some Spanish strings, like Con licencia cedida a YouTube por. But now, I'm getting this text in Italian, like Concesso in licenza a YouTube da. Why? I've realized that I had youtube location in Italiy, I have changed to Spain and I have delete all the pycache folders from the project and from the request module, but it continues getting the Italian version. Any clue?

fullfine
  • 1,371
  • 1
  • 4
  • 11

2 Answers2

1

There are 2 possible ways to deal with this problem:

  • Remove the package and reinstall it:

    pip install --upgrade --force-reinstall requests
    
  • define the port to request from:

    import requests
    
    
    vid = "xRqqOK3IWcE"
    ip_port = "80.59.199.213:8080"
    headers = {
      "User-Agent": (
          'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 '
          '(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36')
    }
    
    proxy = {
       'http':f"http://{ip_port}",
       'https':f"https://{ip_port}"
    }
    
    url = f'https://www.youtube.com/watch?v={vid}'
    
    page = requests.get(url, headers=headers, proxies=proxy, verify=False).text
    
    
Hassan
  • 36
  • 5
  • I have already tried reinstalling the package and it still behaves the same. I noticed that if I open a youtube window in incognito mode, regardless of the browser I use, it always says Italy. I think this is related to my problem. BTW, where does this ip_port come from? – fullfine Nov 17 '21 at 12:27
  • @fullfine The IP comes from googling proxy IP addresses in Spain – Hassan Nov 21 '21 at 14:05
1

I had a similar problem and I solved - thanks to this answer.

If you're using the requests library, then, one option is to specify the language, as follows:

headers = {"Accept-Language": "es-MX"}
url = f'https://www.youtube.com/watch?v={vid}'
page = requests.get(url, headers=headers).text