My intent is to scrape the value of an xpath from a range of URLs, found in the fourth column of a Google Sheet, and to print that value in the cell to the left of the URL.
I have the below so far, but when I run this, it prints the last value of the adGroupStatus list for all URLs, rather than the correct value for each corresponding URL.
Can anyone offer a solution?
import requests
import gspread
from oauth2client.service_account import ServiceAccountCredentials
from lxml import html
scope = ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
sh = client.open('example_sheet_name')
worksheet = sh.get_worksheet(0)
# the column (4th) with our URLs
url_list = worksheet.col_values(4)
# where we want our xpath values to print to
cell_list = worksheet.range('C1:C5')
def grab_xpathtext(urls, cell_range):
# do the below for each url in the spreadsheet column 4:
for url in urls:
r = requests.get(url)
tree = html.fromstring(r.content)
adGroupStatus = tree.xpath('//*[@id="m_count"]/text()')
# below prints each value to the cmd line on a new line as expected
print(adGroupStatus[0])
for cell in cell_range:
# below prints the last value instead of each corresponding value
cell.value = adGroupStatus[0]
worksheet.update_cells(cell_range)
grab_xpathtext(url_list, cell_list)
I expect the output to resemble the this:
| location 1 | description | 1 | url 1 |
| location 2 | description | 2 | url 2 |
| location 3 | description | 3 | url 3 |
| location 4 | description | 4 | url 4 |
| location 5 | description | 5 | url 5 |
... but instead I get this:
| location 1 | description | 5 | url 1 |
| location 2 | description | 5 | url 2 |
| location 3 | description | 5 | url 3 |
| location 4 | description | 5 | url 4 |
| location 5 | description | 5 | url 5 |