1

I have a CSV that I read using pandas and looks like:

                |    URL            | Status Code | 
--------------- | ------------------|-------------|
       0        | www.example.com   |    404      |
----------------|-------------------|-------------|
        1       | www.example.com/2 |   404       |

I want to check if the URLs on the second column are still responding with 404. I have this code:

url = df['URL']
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(rawData)

I get the following error:

InvalidSchema: No connection adapters were found for '0 http://www.example.com

1 http://www.example.com/2

Name: URL, dtype: object'

I searched several questions but could not find the answer. Any help is appreciated.

Community
  • 1
  • 1
Robert Padgett
  • 103
  • 2
  • 2
  • 8

2 Answers2

4

The requests.get is not broadcastable, so you'll either have to call it for each URL with pandas.DataFrame.apply:

>>> df['New Status Code'] = df.URL.apply(lambda url: requests.get(url).status_code)
>>> df
   Status Code                URL  New Status Code
0          404    www.example.com              404
1          404  www.example.com/2              404

or use numpy.vectorize:

>>> vectorized_get = numpy.vectorize(lambda url: requests.get(url).status_code)
>>> df['New Status Code'] = vectorized_get(df.URL)
randomir
  • 17,989
  • 1
  • 40
  • 55
0

df['URL'] is going to return you a Series of data, not a single value. I suspect your code is blowing up on the requests.get(url).content line.

Can you post more of the code?

You may want to look at the apply function: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html.

David
  • 755
  • 5
  • 11