-1

This script loops at the "while True:" written to scrape data from several pages by clicking the next button at the bottom, but I cannot figure out how to structure the code to continue writing to the HTML as it paginates. Instead, it overwrites the html results written previously. Your help is appreciated. Thanks!

while True:
    time.sleep(10)

    golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell")
    print("found %d golds" % len(golds))  

    template = """\
        <tr class="border">
            <td class="image"><img src="{0}"></td>\
            <td class="title"><a href="{1}" target="_new">{2}</a></td>\
            <td class="price">{3}</td>
        </tr>"""

    lines = []

    for gold in golds:
        goldInfo = {}

        goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text
        goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href')
        goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src')

        try:
            goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text
        except NoSuchElementException:
            goldInfo['price'] = 'No price display'

        line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price'])
        lines.append(line)

    try:
        #clicks next button
        driver.find_element_by_link_text("Next→").click()
    except NoSuchElementException:
        break

    time.sleep(10)

    html = """\
        <html>
            <body>
                <table>
                    <tr class='headers'>
                        <td class='image'></td>
                        <td class='title'>Product</td>
                        <td class='price'>Price / Deal</td>
                    </tr>
                </table>
                <table class='data'>
                    {0}
                </table>
            </body>
        </html>\
    """

    f = open('./result.html', 'w')
    f.write(html.format('\n'.join(lines)))
f.close()
Bronson77
  • 251
  • 2
  • 3
  • 11
  • Try to replace `f = open('./result.html', 'w')` with `f = open('./result.html', 'a')` – Andersson May 29 '18 at 15:48
  • Thanks Andersson, perfect. Do you want to post that as an answer to I can accept that answer? – Bronson77 May 29 '18 at 16:06
  • I think [this answer](https://stackoverflow.com/questions/50588292/python-loop-overwriting-last-html-write/50588402#50588402) is more informative – Andersson May 29 '18 at 16:07

2 Answers2

1

You should open the file in append mode by

f = open('./result.html', 'a')
Shivam Singh
  • 1,584
  • 1
  • 10
  • 9
1

Take a look at the different modes when opening a file at the very end of your script: https://docs.python.org/2/library/functions.html#open

The most commonly-used values of mode are 'r' for reading, 'w' for writing (truncating the file if it already exists), and 'a' for appending

And then there's more

Modes 'r+', 'w+' and 'a+' open the file for updating (reading and writing); note that 'w+' truncates the file. Append 'b' to the mode to open the file in binary mode, on systems that differentiate between binary and text files; on systems that don’t have this distinction, adding the 'b' has no effect.

So you have a couple options available. You might use a since you want to append data to it.

Or you could move the file opening outside the loop so that you are not constantly re-opening the file, depending on your needs.

f = open('./result.html', 'w')
while True:
  # do stuff
  f.write (...)
f.close()
MxLDevs
  • 19,048
  • 36
  • 123
  • 194