0

I have a folder of similar-looking scripts which scrape google alerts from their RSS feeds.

All the files are exactly the same except the variable uniqueurl at the end of url

url = 'https://www.google.co.in/alerts/feeds/*uniqueurl*'
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html.parser')

output = []
for entry in soup.find_all('entry'):

    item = {
        'Title': entry.find('title', {'type': 'html'}).text,
        'Pubdate': entry.find('published').text,
        'Content': entry.find('content').text,
        'Link': entry.find('link')['href']
    }

    output.append(item)

df = pd.DataFrame(output)
df.to_csv('google_alert.csv',index=False)

How do I run a command like python create.py uniqueurl which generates the above file with just the url variable updated with what is passed in the command?

Parzival
  • 332
  • 1
  • 3
  • 13

1 Answers1

0

Use sys.argv to capture any variables you want to get at runtime each time the script is executed.

import sys

uniqueUrl = sys.argv[1]

url = f'https://www.google.co.in/alerts/feeds/{uniqueUrl}'
print(uniqueUrl)
destination = sys.argv[2]
print(destination)

resp = requests.get(url)

Then when running a script you can pass a value that will be assigned to those variables like so: python script.py uniqueUrl Category - that way you don't need to store multiple scripts and regenerate them every time you want to make some small difference in the code.

matszwecja
  • 6,357
  • 2
  • 10
  • 17