-1

I've written a script in python to get the content (which are in tabular format) of a site. When I execute my script, It does parse that content successfully. The only thing I can't modify is the language option.

The content of that sites are in Arabic language. However, my intention is to parse that in such a way so that the output I'll get is in english. This is where I'm stuck. I tried with headers={"Accept-Language":"en-US,en;q=0.9"} according to this answer but it doesn't seem to work in this case. How can I change the language option to serve the purpose?

This is my script:

import requests
from bs4 import BeautifulSoup

URL = "http://www.awm.gov.jo/dotnet/default.aspx"

req = requests.get(URL,headers={"Accept-Language":"en-US,en;q=0.9"})
soup = BeautifulSoup(req.text,"lxml")
for items in soup.select("#GV_prices tr"):
    data = [item.get_text(strip=True) for item in items.select("th,td")]
    print(data)

FYI, this was headers={"Accept-Language": "en-US,en;q=0.5"} my first try but It didn't work either.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • If I understood you right, you want the site's english content? If so you may use this link as a start with a english version of the website... http://www.awm.gov.jo/dotnet/company1_en.aspx. You still need to figure out the right site through. – b00r00x0 Jun 28 '18 at 13:20
  • The site is just a placeholder, I wish to know how to handle the language in such cases. Thanks for the link, though. – SIM Jun 28 '18 at 13:22
  • Did you checked in a browser if the site you want to scrape supports an english version? I'm speaking of i18n. You could do so, use in e.g. a browser plugin called `Quick Language Switcher` in chrome or `Quick Accept-Language Switcher` in firefox. – b00r00x0 Jun 28 '18 at 13:30
  • 2
    Not all sites allow for automatic language selection. You should first try it from a standard browser like Chrome, Firefox or IE, and configure the prefered language. If it works from a browser you should try to open the *developper tools* to see how exactly the request was built and sent. If it does not work from a browser, no hope to have it work from Python request... – Serge Ballesta Jun 28 '18 at 13:30
  • Dear downvoter, at least try to leave a reason for pressing that button. Either you know too much or know nothing at all as to how a question should be asked. – SIM Jul 01 '18 at 04:31

1 Answers1

0

You are trying to accept websites in American English, to get Arabic you could try setting accept language to "ar". However this would not translate the page into English for you.

jc1850
  • 1,101
  • 7
  • 16
  • If this is what you meant `headers={"Accept-Language":"ar"}` then it didn't work as well. – SIM Jun 28 '18 at 13:19