0

I am trying to get the list of files with timestamp (date) from my jrog artifactory through python code. I am using BeautifulSoup python package to get the particular details. but its failing to read the data from curl output. please suggest.

#!/usr/bin/python

import os
from bs4 import BeautifulSoup

output = os.system('curl -X GET "http://myjfrog.cricinfo.com:8082/artifactory/generic-local/"')
soup = BeautifulSoup(output, 'html.parser')
print (soap.title)

but i am getting below error

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head><meta name="robots" content="noindex" />
<title>Index of generic-local/</title>
</head>
<body>
<h1>Index of generic-local/</h1>
<pre>Name        Last modified      Size</pre><hr/>
<pre><a href="1.sh">1.sh</a>         24-Jul-2020 06:51    -
<a href="ec2.py">ec2.py</a>       24-Jul-2020 06:46  3.11 KB
<a href="passwd">passwd</a>       21-Jul-2020 13:47  1.29 KB
<a href="s3_test.py">s3_test.py</a>   21-Jul-2020 13:08  94 bytes
<a href="zoo.sh">zoo.sh</a>       24-Jul-2020 06:52    -
</pre>
<hr/><address style="font-size:small;">Artifactory/7.6.3 Server at localhost Port 8081</address></body></html>Traceback (most recent call last):
  File "./2_repo.py", line 7, in <module>
    soup = BeautifulSoup(output, 'html.parser')
  File "/usr/local/lib/python3.6/site-packages/bs4/__init__.py", line 307, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'int' has no len()
  • [`os.system`](https://docs.python.org/3/library/os.html#os.system) the return value is the exit status not the command ouput, you can either use subprocess & preferably ``request`` module as shown in below answer. – sushanth Jul 24 '20 at 11:35

1 Answers1

0

It might have to do with how you use os.system, as the return value is the exit status of the process. Hence the type error.

You can check that in the Python console:

import os
output = os.system('curl google.com')
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     
100   219  100   219    0     0   2737      0 --:--:-- --:--:-- --:--:--  2737
print(output)
0

If you want to stick to executing a command in a subshell, the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using os.system().

Or, just go with requests:

import requests
from bs4 import BeautifulSoup
BeautifulSoup(requests.get("https://google.com").text, "html.parser").title.text

This prints Google.

By the way, you have a typo in your code. You have soup but then you do soap.title

baduker
  • 19,152
  • 9
  • 33
  • 56