1

It would be important that not the entire URL is displayed, but perhaps only "To the article" is displayed, so that the URL behind "To the article" is deposited?

googlenews = GoogleNews() 
googlenews.set_encode('utf_8') 

for ort in orte: 
    googlenews.clear() 
    googlenews.get_news(ort) 
    table_new = [] 
    
    for row in googlenews.results(): 
        table_new.append({ 
            'City': ort, 
            'Title': row['title'], 
            'Date': row['date']}) 
    
        df = pd.DataFrame(table_new) 
        
    nachrichten.append(df)

dfges = pd.concat(nachrichten, axis='index')
print(dfges)
´´´

1 Answers1

1

Your raw URLs are not valid. To turn them into valid Google News URLs you want to add row['link'].replace('/./article', '/article') and add the https:// prefix. Options to obtain the real link, have been discussed here..

This will turn:

news.google.com/./articles/CBMigQFodHRwczovL3d3dy5lc3BuLmNvbS9zb2NjZXIvZ2VybWFuLWJ1bmRlc2xpZ2Evc3RvcnkvNDYzNDMxOC91bmNvbWZvcnRhYmxlLWZyZWlidXJnLWFwcGVhbC1hZnRlci1iYXllcm4tbXVuaWNoLXN1YnN0aXR1dGlvbi1taXgtdXDSAY4BaHR0cHM6Ly93d3cuZXNwbi5jb20vc29jY2VyL2dlcm1hbi1idW5kZXNsaWdhL3N0b3J5LzQ2MzQzMTgvdW5jb21mb3J0YWJsZS1mcmVpYnVyZy1hcHBlYWwtYWZ0ZXItYmF5ZXJuLW11bmljaC1zdWJzdGl0dXRpb24tbWl4LXVwP3BsYXRmb3JtPWFtcA?hl=en-US&gl=US&ceid=US%3Aen

into:

https://news.google.com/articles/CBMigQFodHRwczovL3d3dy5lc3BuLmNvbS9zb2NjZXIvZ2VybWFuLWJ1bmRlc2xpZ2Evc3RvcnkvNDYzNDMxOC91bmNvbWZvcnRhYmxlLWZyZWlidXJnLWFwcGVhbC1hZnRlci1iYXllcm4tbXVuaWNoLXN1YnN0aXR1dGlvbi1taXgtdXDSAY4BaHR0cHM6Ly93d3cuZXNwbi5jb20vc29jY2VyL2dlcm1hbi1idW5kZXNsaWdhL3N0b3J5LzQ2MzQzMTgvdW5jb21mb3J0YWJsZS1mcmVpYnVyZy1hcHBlYWwtYWZ0ZXItYmF5ZXJuLW11bmljaC1zdWJzdGl0dXRpb24tbWl4LXVwP3BsYXRmb3JtPWFtcA?hl=en-US&gl=US&ceid=US%3Aen

To make URLs clickable, you can add the following code, as suggested here:

def make_clickable(val):
    return '<a href="{}">{}</a>'.format(val,'To the article')

dfges.style.format({'URL': make_clickable})

Full code:


import pandas as pd 
from GoogleNews import GoogleNews 
    
googlenews = GoogleNews() 
googlenews.set_encode('utf_8') 
googlenews.set_lang('en') 
googlenews.set_period('7d')
    
orte = ["Munich"] 
nachrichten = []
    
for ort in orte: 
    googlenews.clear() 
    googlenews.get_news(ort) 
    table_new = [] 
    
    for row in googlenews.results(): 
        table_new.append({ 
            'City': ort, 
            'Title': row['title'], 
            'Date': row['date'], 
            'URL': f"https://{row['link'].replace('/./article', '/article')}",
            'Source': row['site'], }) 
    
        df = pd.DataFrame(table_new) 
        
    nachrichten.append(df)

dfges = pd.concat(nachrichten, axis='index')
dfges.drop_duplicates(subset=['Title'], keep='last', inplace=True)
print(dfges)

def make_clickable(val):
    return '<a href="{}">{}</a>'.format(val,'To the article')

dfges.style.format({'URL': make_clickable})

Output:

Open link in a new tab, which leads to a page redirecting to the original article.

output

KarelZe
  • 1,466
  • 1
  • 11
  • 21
  • Thanks a lot. I have tested the code and the links are clickable but when I tried one of them I got `400. That’s an error.`. Are those links valid? – YasserKhalil Apr 06 '22 at 20:30
  • 1
    @YasserKhalil Try to do a right click and open link in a new tab. Otherwise the redirect won't happen. Don't know why Google doesn't allow them straight away. – KarelZe Apr 06 '22 at 20:33
  • Thanks but the same problem `The server cannot process the request because it is malformed` – YasserKhalil Apr 06 '22 at 20:34
  • Could you share the title of the article or the URL please? Don't get response. – KarelZe Apr 06 '22 at 20:35
  • 1
    It is my bad. I just have seen the dataframe `print(dfges)` and didn't notice the last dataframe `dfges.style.format({'URL': make_clickable})`. Now I can see the clickable working links – YasserKhalil Apr 06 '22 at 20:39
  • unfortunately it does not work for me.. – David_Python Apr 07 '22 at 10:51
  • @David_Python Could you please elaborate? It's only for formatting, won't be part of a csv. Is this, what you mean? – KarelZe Apr 07 '22 at 13:47
  • I am sending dfges with Mail and in the Mail I see the whole URL which also not clickable bro ... – David_Python Apr 07 '22 at 14:10
  • @David_Python Yes, it's because it's only a formatter for displaying purporse. What is the type of attachment? You could try to write an `apply()` function, that does the conversion, but it will still not work for csv. – KarelZe Apr 07 '22 at 14:38
  • Thank a lot It works now :) Can you please check this question? Probably you know what to do: https://stackoverflow.com/questions/71795148/python-html-table-should-be-striped – David_Python Apr 08 '22 at 10:12