I am trying to make a csv in a daily basis from a specific website table: https://lunarcrush.com/exchanges
I've tried to use every single piece of advice on the related topics here (eg. How to extract tables from websites in Python , Python Extract Table from URL to csv , extract a html table data to csv file and many many more)
I thought that my initial problem was that I didn't have the table id (such as in other examples, I've only found the (table) class name MuiTable-root
. But after a little more digging up I found out that whenever I was reading the url, the HTML code I was getting was completely different, rather than the one I see when I use Inspect(O) click on my browser.
I've tried almost everything I found here, so I am not sure if it helps to quote every singe code. As an example I just quote the following, that I was trying to make it work. The idea is simple (to find the tr
part of the table and get the th
(header) and td
(data), and after that I'd extract them to a csv.
from lxml import etree
import urllib.request
web = urllib.request.urlopen("https://lunarcrush.com/exchanges")
s = web.read()
html = etree.HTML(s)
## Get all 'tr'
tr_nodes = html.xpath('//table[@class="MuiTableHead-root"]/tr')
## 'th' is inside first 'tr'
header = [i[0].text for i in tr_nodes[0].xpath("th")]
## Get text from rest all 'tr'
td_content = [[td.text for td in tr.xpath('td')] for tr in tr_nodes[1:]]
print(td_content)
Any ideas? I am sorry for my long (and maybe silly) question, I am just starting to use python, and there are still lots to learn!