2

I am trying to download the data on this website https://coinmunity.co/ ...in order to manipulate later it in Python or Pandas I have tried to do it directly to Pandas via Requests, but did not work, using this code:

res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
table = soup.find_all('table')[0]
dfm = pd.read_html(str(table), header = 0)
dfm = dfm[0].dropna(axis=0, thresh=4)
dfm.head()

In most of the things I tried, I could only get to the info in the headers, which seems to be the only table seen in this page by the code.

Seeing that this did not work, I tried to do the same scraping with Requests and BeautifulSoup, but it did not work either. This is my code:

import requests
from bs4 import BeautifulSoup

res = requests.get("https://coinmunity.co/")
soup = BeautifulSoup(res.content, 'lxml')
#table = soup.find_all('table')[0]
#table = soup.find_all('div', {'class':'inner-container'})
#table = soup.find_all('tbody', {'class':'_ngcontent-c0'})
#table = soup.find_all('table')[0].findAll('tr')
#table = soup.find_all('table')[0].find('tbody')#.find_all('tbody _ngcontent-c3=""')
table = soup.find_all('p', {'class':'stats change positiveSubscribers'})

You can see in the lines commented, all the things I have tried, but nothing worked. Is there any way to easily download that table to use it on Pandas/Python, in the tidiest, easier and quickest possible way? Thank you

skeitel
  • 271
  • 2
  • 6
  • 17

1 Answers1

0

Since the content is loaded dynamically after the initial request is made, you won't be able to scrape this data with request. Here's what I would do instead:

from selenium import webdriver
import pandas as pd
import time
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
driver.implicitly_wait(10)
driver.get("https://coinmunity.co/")

html = driver.page_source.encode('utf-8')

soup = BeautifulSoup(html, 'lxml')

results = []
for row in soup.find_all('tr')[2:]:
    data = row.find_all('td')
    name = data[1].find('a').text
    value = data[2].find('p').text
    # get the rest of the data you need about each coin here, then add it to the dictionary that you append to results
    results.append({'name':name, 'value':value})

df = pd.DataFrame(results)

df.head()

name    value
0   NULS    14,005
1   VEN 84,486
2   EDO 20,052
3   CLUB    1,996
4   HSR 8,433

You will need to make sure that geckodriver is installed and that it is in your PATH. I just scraped the name of each coin and the value but getting the rest of the information should be easy.

briancaffey
  • 2,339
  • 6
  • 34
  • 62
  • 2
    I **strongly** suggest the use of [explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits) over any one of the implicit waits. – Keyur Potdar Jan 22 '18 at 03:47
  • 1
    For 2 reasons - 1. `time.sleep(5)` is slow and a waste of time if the page loads faster. 2. It is unreliable if the internet connection or the site is slow. – Keyur Potdar Jan 22 '18 at 03:49
  • 1
    @KeyurPotdar thanks for pointing that out. I updated my answer by adding an implicit wait and removing the sleep. – briancaffey Jan 22 '18 at 03:57
  • Actually I was talking about [explicit wait](http://selenium-python.readthedocs.io/waits.html#explicit-waits), not the [implicitly_wait](http://selenium-python.readthedocs.io/waits.html#implicit-waits). You can read this for the reason behind that- [When to use explicit wait vs implicit wait in Selenium Webdriver](https://stackoverflow.com/a/28067495/7832176). – Keyur Potdar Jan 22 '18 at 04:03
  • oh, sorry I read that incorrectly, I'll try to add an `explicit_wait` instead. thanks again @KeyurPotdar – briancaffey Jan 22 '18 at 04:08
  • Thank you, but unfortunately it does not work on my end. I think "soup" is not able to see any 'tr' below the header ones. For example, if I try: 'code' for row in soup.find_all('tr'): #[2:] print(row) I only obtain as a result: ... Followers Subscribers Price – skeitel Jan 22 '18 at 11:54
  • @skeitel did you try it using selenium and webdriver? – briancaffey Jan 22 '18 at 13:22
  • I am trying to do that right now but I don't have very clear the structure of how to add each row to the dictionary, since rows are not easily detected. – skeitel Jan 22 '18 at 13:29
  • I am trying the Selenium solution as suggested and posted a new question, so I don't know the protocol on what to do with this question. I hesitate to tick it as correct for the code has not worked for me. Should I close it or do something else? Thanks for help. – skeitel Jan 22 '18 at 15:59