0

I am using Python/Selenium to extract some text from a website to further sort it in Google Sheets.

There are 15 headers for which I need to extract text. The text is found under each header in tag h5.

Here's one extract of a header:

<tr class="dayHeader"><td colspan="7" style="padding:10px 0;"><hr><h5>&nbsp;&nbsp;Thursday - 28 January 2021</h5></td></tr>
  <td colspan="7" style="padding:10px 0;"><hr><h5>&nbsp;&nbsp;Thursday - 28 January 2021</h5></td>
    <hr>
    <h5>&nbsp;&nbsp;Thursday - 28 January 2021</h5>
    </td>
  </tr>

What I have done is the following:

headers = driver.find_elements_by_tag_name('h5')
results = []

for header in headers:
    result = header.text
    results.append(result)

The for loop above outputs the following list:

['Result 1']
['Result 1', 'Result 2']
['Result 1', 'Result 2', 'Result 3']

Instead, how can I get it to output:

['Result 1', 'Result 2', 'Result 3']
userX
  • 79
  • 7
  • The above for loop shouldn't be outputting anything... did you miss a `print` statement when you copied your code over? – Sumner Evans Jan 29 '21 at 07:13

1 Answers1

1

A wrong indent of your print put it out of your loop like:

headers = driver.find_elements_by_tag_name('h5')
results = []

for header in headers:
    result = header.text
    results.append(result)

print(results)

Will only print ones:

['Result 1', 'Result 2', 'Result 3']

instead of :

headers = driver.find_elements_by_tag_name('h5')
results = []

for header in headers:
    result = header.text
    results.append(result)
    print(results)

Will every iteration print:

['Result 1']
['Result 1', 'Result 2']
['Result 1', 'Result 2', 'Result 3']
HedgeHog
  • 22,146
  • 4
  • 14
  • 36