4

I would like to subscribe to an RSS/XML feed from Google News that captures the following query:

Articles mentioning "studie" (German for "study"), written in German, emanating from any country.

I'm using https://news.google.com/rss/search, but for this example, it's easier to see the UI output at https://news.google.com/search, so I'll use the latter URL base in this example.

Now, in the XML API reference, Google mentions four different parameters that influence either language or country:

  • hl (host language): the language that the end user is assumed to be typing in. I.e., an English-language speaker types "study," and Google assumes that term is in English and then machine-translates the results back to English. For me, navigating to will redirect a URL with hl=en-US (full URL is https://news.google.com/?hl=en-US&gl=US&ceid=US:en).

  • gl: boosts search results whose country of origin matches the parameter value. The default in my web browser is gl=US.

  • lr (language restrict): restricts search results to documents written in a particular language

  • cr (country restrict): restricts search results to documents originating in a particular country

Based on all of the above, that would imply a URL of*:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

That attempt, however, fails miserably; it shows English-language results from the U.S., and it 302 redirects to:

https://news.google.com/search?q=study&lr=lang_de&hl=en-US&gl=US&ceid=US:en

So, to that end:

  • How can I properly structure URL parameters to capture 'Articles mentioning "studie" (German for "study"), written in German, from any country.'?
  • What the heck is ceid and why is it documented absolutely nowhere by Google?

* I.e.:

>>> import urllib.parse
>>> urllib.parse.parse_qs('q=study&hl=en-US&lr=lang_de')                                                                                                     
{'q': ['study'], 'hl': ['en-US'], 'lr': ['lang_de']}

Related but not resolving any of this:

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
  • Have you tried this in postman or curl ? – Edward Aung Sep 10 '19 at 05:10
  • Just Python and in a browser @EdwardAung. (Which both allow redirects by default.) Would you suspect curl would produce different behavior? – Brad Solomon Sep 10 '19 at 12:00
  • The linked documentation mentions that the `client`, `output`, and `cx` parameters are all required – Ezphares Sep 10 '19 at 19:16
  • Yes, but that's for Google Custom Search Engine @Ezphares. news.google.com doesn't seem to require those – Brad Solomon Sep 10 '19 at 23:13
  • If the documentation is only valid for Custom Search then I would expect any information on `hl` and `lr` to also be valid only in that context – Ezphares Sep 11 '19 at 07:10
  • I think the disappointing fact is that what you want to do is no longer supported. Have you considered an alternative such as https://gnews.io? I haven't tried it but it bills itself as an unofficial Google News API. – Matt L. Sep 13 '19 at 12:40
  • Currently the API is still working and there is one deleted answer providing the query-string parameters, which deliver the desired results. I've requested support on [meta.stackoverflow.com](https://meta.stackoverflow.com/questions/389368/possible-moderator-mistake), because I cannot vote to un-delete that. Since the API still works, that answer can be validated. – Martin Zeitler Sep 13 '19 at 22:59

3 Answers3

0

I'm using the following URL, it works for me:

https://news.google.com/rss?q=studie&hl=de-DE&gl=DE&ceid=DE:de

you can also search in topics, please refer to this answer: URL format for Google News RSS feed

Eugene G
  • 244
  • 1
  • 4
  • Thank you, but this unfortunately doesn't fully answer the question. Try using this format to find, for instance, Arabic-language news (ar) published in the UK. You'll see that redirects back to `&hl=en-US&gl=US&ceid=US:en`, or whatever your defaults are – Brad Solomon Sep 16 '19 at 12:14
  • Maybe this link will help to clarify APIs: https://developers.google.com/custom-search/docs/xml_results – Eugene G Sep 16 '19 at 23:31
0

I know nothing about the RSS interface but as for the standard news UI maybe this can be of use:

ceid (country:language) is Google's news filter, so lr (which Google news seem to ignore) and cr are restricted even further by only sifting through the news defined by the news filter. For US news in English it's ceid=US:en and for news in Great Britian it's ceid=GB:en. Source: https://rapidapi.com/apigeek/api/google-search3/details

NOTE: If you don't specify a ceid, one will be applied based on your current position. Also, Google news doesn't seem to care at all about the lr parameter: it sticks to the language of ceid and that's it. Based on your query: Articles mentioning "studie" (German for "study"), written in German, emanating from any country, I would suggest a value of DE:de, however you may find the ceid parameter somewhat constricting regarding "emanating from any country", but there's nothing you can do about that. Google news is based on the concept that every place has their own news feed, and "emanating from any country" sounds an awful lot like "all the news from all places on Earth", and there's no such Google news. "World" news is as you know not quite the same thing. If you need to have no restrictions at all regarding country of production/publication, you'll be better off looking for another outlet. In the Google universe, an advanced Google search proper applying a restriction when the document was published for freshness is probably impossible to beat.

The four other parameters involved in your search are:

hl, host(interface) language: hl=de
gl, boost country of origin: gl=DE
lr, restrict results to language: lr=de
cr, restrict results to country: none

There's two mistakes in the suggested search string:

https://news.google.com/search?q=study&hl=en-US&lr=lang_de

q=studie, not study, and
lr=de, not lang_de.

However, Google news doesn't care about the lr parameter: it sticks to the language of ceid. Also, hl is always set to the language of ceid and gl is set to the country part, and I recommend a ceid of DE:de for your query.

So the search string for DE:de becomes:

https://news.google.com/search?q=studie&hl=de&gl=DE&ceid=DE:de

Also to add to the Library of Congress link given by Sreeram Nair, there's no country codes given there. You can find country codes here:

• the ISO 3166-1 alpha-2 (2-letter country) standard, https://en.m.wikipedia.org/wiki/ISO_3166-1_alpha-2

You may also find this document with language codes easier to read on a mobile:

• List of ISO 639-1 (language) codes https://en.m.wikipedia.org/wiki/List_of_ISO_639-1_codes

Sources: Wikipedia articles

• the software term Locale, https://en.m.wikipedia.org/wiki/Locale_(computer_software)

• the ISO 639 (language) standard, https://en.m.wikipedia.org/wiki/ISO_639

-1

The New URL for Google New RSS is changed. You can use the following format for fetching. Also examples can be seen here.

usage: gnrss2opml.py [-h] [-o OUTPUT] [-c COUNTRY] [-l LANGUAGE] [-s]
                     [-t [TOPIC [TOPIC ...]]] [-g [LOCATION [LOCATION ...]]]
                     [-q [QUERY [QUERY ...]]]

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file name (default: print to stdout)
  -c COUNTRY, --country COUNTRY
                        country / Google News edition (default: us)
  -l LANGUAGE, --language LANGUAGE
                        language (default: en)
  -s, --stories         include Top Stories
  -t [TOPIC [TOPIC ...]], --topics [TOPIC [TOPIC ...]]
                        list of topics, will be converted to uppercase
                        (default: WORLD NATION BUSINESS TECHNOLOGY
                        ENTERTAINMENT SPORTS SCIENCE HEALTH)
  -g [LOCATION [LOCATION ...]], --locations [LOCATION [LOCATION ...]]
                        list of geographic locations (default: None)
  -q [QUERY [QUERY ...]], --queries [QUERY [QUERY ...]]
                        list of search queries (default: None)

EDIT1:

The 2 letter language code and country code can be specified in the argument.

Get the codes from here

Sreeram Nair
  • 2,369
  • 12
  • 27
  • `https://news.google.com/news/rss/search/section/q/economy` now redirects to `https://news.google.com/rss/search?q=economy&hl=en-US&gl=US&ceid=US:en`, which brings things right back to the original question here – Brad Solomon Sep 13 '19 at 12:27