1

Hi guys this is my code:

#! /user/bin/env python
import os
import lxml
from bs4 import BeautifulSoup
string = os.system("curl -i https://it.wikipedia.org/wiki/Coldplay")
soup = BeautifulSoup(string, features="xml")
tag = soup.find_all("tbody")

and when i will execute it this is my error

  enter code hereTraceback (most recent call last):
  File "script_wiki.py", line 6, in <module>
  soup = BeautifulSoup(string, features="xml")
  File "build/bdist.macosx-10.12-intel/egg/bs4/__init__.py", line 192, in __init__
  TypeError: object of type 'int' has no len()

i'm beginner and i haven't a idea what is the problem sorry

DMine
  • 203
  • 2
  • 4
  • 13

3 Answers3

2

os.system will return the exit code of the command. Not the return data. If the command is ran successfully, string will be 0.

You should instead use subprocess.call:

subprocess.call([
    'curl',
    '-i',
    "https://it.wikipedia.org/wiki/Coldplay"
    ], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

But you shouldn't use curl command when using python, use urllib instead:

import urllib.request 
url = "https://it.wikipedia.org/wiki/Coldplay"
soup = BeautifulSoup(urllib.request.urlopen(url), features="xml")
tag = soup.find_all("tbody")

if you're using python 2, do import urllib2 and use urllib2.urlopen instead.

A more popular way over using urllib is to use the requests library, but you will need to install it, pip install requests.

import requests
r = requests.get('https://it.wikipedia.org/wiki/Coldplay')
soup = BeautifulSoup(r.content, features="xml")
tag = soup.find_all("tbody")
Taku
  • 31,927
  • 11
  • 74
  • 85
1

You were correctly told in the comments that system does not return the program output. But there is no need to use curl at all. Python module urllib.request works just fine:

from urllib.request import urlopen
URL = "https://it.wikipedia.o‌​rg/wiki/Coldplay"
soup = BeautifulSoup(urlopen(URL), f‌​eatures="xml")
DYZ
  • 55,249
  • 10
  • 64
  • 93
1

To use curl, you need to import and run subprocess.check_output or subprocess.run (Python 3.5+ only) to capture the output of a system command. As stated already, os.system returns only the error status of the command you run, not its output, which goes to console.

See this discussion:

Running shell command from Python and capturing the output

There may be better ways to achieve your goal, but to understand how to make your existing code work, try subprocess.

Community
  • 1
  • 1
Kurt
  • 11
  • 1