I have a code which allows me to pull the links of some news sites. I want only to pull the links with the name of the city - Gdańsk. However not always the correct spelling is used in the URL's, so I needed to put in gdańsk ,gdansk etc. I also want to pull it from different sites. I was able to add more words and sites, but it made me do more for loops. Would you please direct me on how I can make the code more efficient and shorter?
Second question: I'm exporting the links I receive into a CSV file. I want to gather them there to later analize them. I found out that if i replace "w" with "a" in the csv = open(plik,"a") it should be appending the file. Instead - nothing happens. When it's just "w" it's overwriting the file, but that's now what I need
import requests
from bs4 import BeautifulSoup as bs
from datetime import datetime
def data(timedateformat='complete'):
formatdaty = timedateformat.lower()
if timedateformat == 'rokmscdz':
return (str(datetime.now())).split(' ')[0]
elif timedateformat == 'dzmscrok':
return ((str(datetime.now())).split(' ')[0]).split('-')[2] + '-' + ((str(datetime.now())).split(' ')[0]).split('-')[1] + '-' + ((str(datetime.now())).split(' ')[0]).split('-')[0]
a = requests.get('http://www.dziennikbaltycki.pl')
b = requests.get('http://www.trojmiasto.pl')
zupa = bs(a.content, 'lxml')
zupka = bs(b.content, 'lxml')
rezultaty1 = [item['href'] for item in zupa.select(" [href*='Gdansk']")]
rezultaty2 = [item['href'] for item in zupa.select("[href*='gdansk']")]
rezultaty3 = [item['href'] for item in zupa.select("[href*='Gdańsk']")]
rezultaty4 = [item['href'] for item in zupa.select("[href*='gdańsk']")]
rezultaty5 = [item['href'] for item in zupka.select("[href*='Gdansk']")]
rezultaty6 = [item['href'] for item in zupka.select("[href*='gdansk']")]
rezultaty7 = [item['href'] for item in zupka.select("[href*='Gdańsk']")]
rezultaty8 = [item['href'] for item in zupka.select("[href*='gdańsk']")]
s = set()
plik = "dupa.csv"
csv = open(plik,"a")
for item in rezultaty1:
s.add(item)
for item in rezultaty2:
s.add(item)
for item in rezultaty3:
s.add(item)
for item in rezultaty4:
s.add(item)
for item in rezultaty5:
s.add(item)
for item in rezultaty6:
s.add(item)
for item in rezultaty7:
s.add(item)
for item in rezultaty8:
s.add(item)
for item in s:
print('Data wpisu: ' + data('dzmscrok'))
print('Link: ' + item)
print('\n')
csv.write('Data wpisu: ' + data('dzmscrok') + '\n')
csv.write(item + '\n'+'\n')