0

I'm writing a script to do the following:

  1. Ingest a csv file
  2. Loop through values in a url column
  3. Return status codes for each url field

My data is coming from a csv file that I've written. The url field contains a string with 1 or 2 urls to check.

The CSV file is structured as follows:

id,site_id,url_check,js_pixel_json
12187,333304,"[""http://www.google.com"", ""http://www.facebook.com""]",[]
12187,333304,"[""http://www.google.com""]",[]

I have a function that loops through every column correctly however when it I attempt to pull the status code, I'm getting a

Traceback (most recent call last):
  File "help.py", line 29, in <module>
    loopUrl(inputReader)
  File "help.py", line 26, in loopUrl
    urlStatus = requests.get(url)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 498, in request
    prep = self.prepare_request(req)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 441, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 309, in prepare
    self.prepare_url(url, params)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 375, in prepare_url
    scheme, auth, host, port, path, query, fragment = parse_url(url)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/url.py", line 185, in parse_url
    host, url = url.split(']', 1)
ValueError: not enough values to unpack (expected 2, got 1)

Here is my code:

import requests 
import csv 

input = open('stackoverflow_help.csv')
inputReader = csv.reader(input)


def loopUrl(inputReader):
    pixelCheck = []
    for row in inputReader:
        checkUrl = row[2]
        if inputReader.line_num == 1:
            continue #skip first row
        elif checkUrl == '[]':
            continue
        elif checkUrl == 'NULL':
            continue
        urlList = str(checkUrl)
        for url in urlList:
            urlStatus = requests.get(url)
        print(urlStatus.response_code)

loopUrl(inputReader)

The issue traces back to the module and I believe something is happening with the loop which is causing the error.

Eb946207
  • 748
  • 8
  • 26
bglaze
  • 21
  • 3
  • 2
    post the traceback if the error as well – Tobey Jan 14 '19 at 21:17
  • What's `response_code`? Please post your actual code. – blhsing Jan 14 '19 at 21:21
  • I take it that he means `.status_code`? (Like he uses in `getStatus()`). – Niels Henkens Jan 14 '19 at 21:26
  • Traceback (most recent call last): File "help.py", line 29, in loopUrl(inputReader) File "help.py", line 26, in loopUrl urlStatus = requests.get(url) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 72, in get return request('get', url, params=params, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/api.py", line 58, in request – bglaze Jan 14 '19 at 21:27
  • return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 498, in request prep = self.prepare_request(req) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/sessions.py", line 441, in prepare_request hooks=merge_hooks(request.hooks, self.hooks), File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 309, in prepare self.prepare_url(url, params) – bglaze Jan 14 '19 at 21:31
  • File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests/models.py", line 375, in prepare_url scheme, auth, host, port, path, query, fragment = parse_url(url) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/urllib3/util/url.py", line 185, in parse_url host, url = url.split(']', 1) ValueError: not enough values to unpack (expected 2, got 1) – bglaze Jan 14 '19 at 21:31
  • Updated with the full error I'm getting. I'm having trouble figuring out what the other value it's expecting... – bglaze Jan 14 '19 at 21:31
  • updated the post with error details – bglaze Jan 14 '19 at 21:35
  • 2
    You are trying to iterate a string, not a list – chitown88 Jan 14 '19 at 21:40
  • 2
    You're converting the string `"[""http://www.google.com"", ""http://www.facebook.com""]"` to another string via `urlList = str(checkUrl)`, then you proceed to iterate over that string. The URL object then tries to fetch the url `'['`, `'""`, etc. – cade Jan 14 '19 at 21:44

2 Answers2

1

["http://www.google.com", "http://www.facebook.com"] is a string, not a list. You are iterating it character by character, thus giving you the error above. You need to do a safe evaluation of the list to get the list of URLs instead of strings.

Example:

>>> import ast
>>> x = u'[ "A","B","C" , " D"]'
>>> x = ast.literal_eval(x)
>>> x
['A', 'B', 'C', ' D']
>>> x = [n.strip() for n in x]
>>> x
['A', 'B', 'C', 'D']

Reference: Convert string representation of list to list

In your code it would be:

    urlList = ast.literal_eval(checkUrl) # not str(checkUrl)
    for url in urlList:
        urlStatus = requests.get(url)
    print(urlStatus.response_code)
James T
  • 46
  • 3
0

Need to clean this up a bit, but should get you going:

import requests 
import csv 
import ast


input = open('stackoverflow_help.csv')
inputReader = csv.reader(input)


def loopUrl(inputReader):
    pixelCheck = []
    for row in inputReader:
        if inputReader.line_num == 1:
            continue #skip first row

        checkUrl = row[2]
        try:
            checkUrl = ast.literal_eval(checkUrl)
        except:
            continue


        if checkUrl == []:
            continue
        elif checkUrl == 'NULL':
            continue

        for url in checkUrl:
            urlStatus = requests.get(url)
            print(urlStatus.status_code)

loopUrl(inputReader)

Output:

200
200
200
chitown88
  • 27,527
  • 4
  • 30
  • 59