0

My goal is to build a scraper that extract data from a table from this site.

Initially I followed the tutorial of Scrapy where I succeeded in extracting data from the test site. When I try to replicate it for Bitinfocharts, first issue is I need to use xpath, which the tutorial doesn't cover in detail (they use css only). I have been able to scrape the specific data I want through shell.

  • My current issue is understanding how I can scrape them all from my code and at the same time write the results to a .csv / .json file?

I'm probably missing something completely obvious. If you can have a look at my code and let me know I'm doing wrong, I would deeply appreciate it.

Thanks!

First attempt:

import scrapy

class RichlistTestItem(scrapy.Item):
    # overview details
    wallet = scrapy.Field()
    balance = scrapy.Field()
    percentage_of_coins = scrapy.Field()

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    allowed_domain = ['https://bitinfocharts.com/']
    start_urls = [
        'https://bitinfocharts.com/top-100-richest-vertcoin-addresses.html'
    ]

    def parse(self, response):
        for sel in response.xpath("//*[@id='tblOne']/tbody/tr/"):            
            scrapy.Item in RichlistTestItem()
            scrapy.Item['wallet'] = sel.xpath('td[2]/a/text()').extract()[0]
            scrapy.Item['balance'] = sel.xpath('td[3]/a/text').extract()[0]
            scrapy.Item['percentage_of_coins'] = sel.xpath('/td[4]/a/text').extract()[0]

            yield('wallet', 'balance', 'percentage_of_coins')

Second attempt: (probably closer to 50th attempt)

import scrapy

class RichlistTestItem(scrapy.Item):
    # overview details
    wallet = scrapy.Field()
    balance = scrapy.Field()
    percentage_of_coins = scrapy.Field()

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    allowed_domain = ['https://bitinfocharts.com/']
    start_urls = [
        'https://bitinfocharts.com/top-100-richest-vertcoin-addresses.html'
    ]

    def parse(self, response):
        for sel in response.xpath("//*[@id='tblOne']/tbody/tr/"):            
            wallet = sel.xpath('td[2]/a/text()').extract()
            balance = sel.xpath('td[3]/a/text').extract()
            percentage_of_coins = sel.xpath('/td[4]/a/text').extract()

            print(wallet, balance, percentage_of_coins)
Muggsie
  • 17
  • 5

1 Answers1

1

I have fixed your second trial, specifically the code snippet below

for sel in response.xpath("//*[@id=\"tblOne\"]/tbody/tr"):                                                                                                                                                            
    wallet = sel.xpath('td[2]/a/text()').extract()                                                                                                                                                                    
    balance = sel.xpath('td[3]/text()').extract()                                                                                                                                                                     
    percentage_of_coins = sel.xpath('td[4]/text()').extract()   

The problems, I found are

  • there was a trailing "/" for the table row selector.
  • for balance the value was inside td not inside a link inside td
  • for percetag.. again the value was inside td.

Also there is a data-val property for each of the td. Scraping those might be little easier than getting the value from inside of td.

Biswanath
  • 9,075
  • 12
  • 44
  • 58
  • Thank you! How do I call the function / class to see the result? Sorry for the simple question, complete beginner here – Muggsie Nov 03 '18 at 07:38
  • scrapy runspider . I assume there are other ways to kick off. Also see https://stackoverflow.com/questions/21662689/scrapy-run-spider-from-script – Biswanath Nov 03 '18 at 07:52
  • Well it runs, but there is no result. I also tried this: cd path scrapy crawl quotes -o quotes.json What happens is that it runs and write an empty .json file. How can I get this code to write the result to a .json file or .csv? – Muggsie Nov 03 '18 at 08:02
  • I think you should open objective questions for you problems. As in if your problems are writing to a json file, ask people help to write a json file or see other answers on how to write to a json file. – Biswanath Nov 03 '18 at 08:04
  • Ok I thought it would be obvious. I'll make sure my next questions are more specific – Muggsie Nov 03 '18 at 08:11
  • How can be that obvious that you want to write it to a file, a json file ? Also it is better if you could ask your question per issue. As in how to write content to a json file, issue with a scrapper. At the end of the day, this question simply feels you are dumping your work on community. As in you have issue with how to run a scrapper, you have issues with you xpaths and you have issues with writing on to json file. May be somebody will write the code for you, but at the end of the it is a pretty low effort from you. Sorry if my comment comes across as preachy. Good Luck. – Biswanath Nov 03 '18 at 08:17