0

I have a python script for a single URL and I need to run this for multiple URLs from url.txt and grab the output in a single txt file.

Here is the python script (minified):

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://www.example.com/page/1024'
#Rest of the script here
print var1
print var2
print var3

Here is an example output for one URL:

Name: John Doe
DOB: 01-Jan-1980
Gender: Male

I want this output for URL 1 which my script gives exactly as I want. I want to repeat this for URL 2, URL 3 and so on as in url.txt.

Any ideas how?

P.S. I've kept the question simple but if you need more details, lemme know and I'll do so.

mumer91
  • 113
  • 1
  • 9
  • What are var1, var2, var3? what is the output even going to be? you want the webpage in HTML format? – thatNLPguy Mar 28 '19 at 19:42
  • Sorry I didn't explain it well enough. These are text prints with info I want like name, age etc. I want these outputs for URL1, then URL2 and so on.. – mumer91 Mar 28 '19 at 19:46

2 Answers2

0

Open a file in append mode and write the output to it for each of these.

import urllib2
from bs4 import BeautifulSoup
quote_page = 'https://www.example.com/page/1024'
#Rest of the script here
output = open("output.txt", 'a') # 'a' means open in append mode so the file is not overwritten
# change print to output.write()
output.write(str(var1) + '\n') # separate each var by a new line
output.write(str(var2) + '\n')
output.write(str(var3) + '\n')

output.close()

This will write all of var1, then all of var2, then all of var3, each separated by blank line, and then close the file.

To make this more compatible to accept urls from the command line:

import sys
import urllib2
from bs4 import BeautifulSoup
quote_page = sys.argv[1] # this should be the first argument on the command line
#Rest of the script here
output = open("output.txt", 'a') # 'a' means open in append mode so the file is not overwritten
# change print to output.write()
output.write(str(var1) + '\n') # separate each var by a new line
output.write(str(var2) + '\n')
output.write(str(var3) + '\n')

output.close()

The example command line using your url:

$python3.6 myurl.py https://www.example.com/page/1024
d_kennetz
  • 5,219
  • 5
  • 21
  • 44
  • Thanks but I want the outputs var1, var2, var3 for URL1 then all three for URL 2 and so on. Can you plz edit answer? – mumer91 Mar 28 '19 at 19:45
  • This will work if you run it like: `python3.6 myurl.py "www.url1.com"` then run it a second time on your second url like: `python3.6 myurl.py "www.url2.com"` etc until you are done. – d_kennetz Mar 28 '19 at 19:48
  • I tried and get this error: `UnicodeEncodeError: 'ascii' codec can't encode character u'\ue800' in position 46: ordinal not in range(128)` Any ideas why? – mumer91 Mar 28 '19 at 20:07
  • It means you are trying to write unicode to an output that only handles ascii. Visit [this page](https://stackoverflow.com/questions/2365411/convert-unicode-to-ascii-without-errors-in-python) – d_kennetz Mar 28 '19 at 20:13
  • Sorry but I'm confused. If I run my script (as in example in my post), I get my desired output. Why would encoding error come in loop? – mumer91 Mar 28 '19 at 20:22
  • Because you are now writing to a file which has different implications. Your command prompt can probably interpret both encodings but things are handled differently in file writing. You’ll want the encoding the be consistent when writing to file. – d_kennetz Mar 28 '19 at 20:27
  • I don't really get that but I used your code without the output parts and then used cat to write to file and repeated for URLs in loop. This is very inefficient but got the job done. Thanks a lot for your help. Appreciated. – mumer91 Mar 28 '19 at 20:33
  • That works too! You can pass it like: `python3.6 myurl.py "www.url1.com" >> output.txt`, I duct tape develop myself (it's in my bio!) whatever gets the job done. – d_kennetz Mar 28 '19 at 20:41
0

To get urls from your file you need to open it then run your script for each line. Assuming there is one url by line. To write into your output file, open a file and write var1, var2 and var3 into it

import urllib2
from bs4 import BeautifulSoup

with open('url.txt') as input_file:
    for url in input_file:
        quote_page = url
        #Rest of the script here

with open("ouput_file.txt", "w") as output:
    output.write(f'{var1}\n')
    output.write(f'{var2}\n')
    output.write(f'{var3}\n')
Louis Saglio
  • 1,120
  • 10
  • 20