I suppose you want to store prices of a financial instrument with a timestamp, so you then can sort the time-serie and work with it. I tried your code (it's one year old, I know!) but it does not work correctly, there is somehow a basic problem: if you want to look for "one specific" value e.g. last price of a stock using bs4 as scraping tool, you not only have to use a "find_all" method, but also a "find" inside of the "find_all"-found records, to get the one specific value.
Let's say the html page contains various 'div's that share the same class, let's call it 'magic-class' , only one of those divs contains the value you need, last price. So you need to find all the divs with that class and then, e.g. with a for-cycle , find each value contained in each div with that class.
So, a part from this problem which is by nature related to the specific structure of the page your intend to scrape, in case you want to store found values inside of a Pandas Dataframe, here is an example you could use as a starting point:
import urllib.request
from urllib.error import HTTPError, URLError
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import random
from datetime import datetime
import time
from http.cookiejar import CookieJar
price_all = pd.DataFrame()
def checkprice():
url = "https://www.yourlink.com"
# Request
user_agents = [
'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.142 Safari/535.19',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)',
'Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)',
]
cj = CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', user_agents[random.randint(0,4)]) ]
opener.addheaders = [('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8') ]
# opener.addheaders = [('Accept-Charset', 'ISO-8859-1,utf-8;q=0.7,*;q=0.3') ]
# opener.addheaders = [('Accept-Language', 'it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4') ]
# opener.addheaders = [('Accept-Encoding', 'gzip, deflate, sdch') ]
response = opener.open(url, timeout= 5)
#choose one webpage parser
soup = BeautifulSoup(response,'html.parser')
# soup = BeautifulSoup(response,'html5lib')
# soup = BeautifulSoup(response,'lxml')
found_values = soup.find_all('div', class_='magic-class')
if (len(found_values) > 0):
number_of_values = len(found_values)
else:
print('No value '+ url)
return
list_values = []
list_timestamps = []
for n in np.arange(0, number_of_values):
# Getting the values
title = found_values[n].find('a').get_text()
list_values.append(title)
#optional: append timestamp:
timestamp = datetime.fromtimestamp(time.time())
list_timestamps.append(timestamp)
df_show_info = pd.DataFrame(
{'Value': list_values,
'Time': list_timestamps
})
return df_show_info
while True:
price_all = price_all.append(checkprice(), ignore_index=False).copy()
time.sleep(5)
This will create a general DF called 'price_all' which contains all prices and timestamps checked approx. every 5 seconds. There are more elegant ways to repeat one action every 'x' number of seconds, this is the most basic one.
The technique to use web-scraping tools to obtain prices of financial instruments is quite obsolete and overpassed by other methods, one of most famous ones is Pandas Datareader which is a simple library that provide easy access to a number of online resources that provide financial datas. It perfectly matches the Pandas logic and it's really easy to use.
Does this solve your problem ?