can not find table content (hidden table) when scrapy on a web

Question

I am trying to scrape the following url (http://cmegroup.com/clearing/operations-and-deliveries/accepted-trade-types/block-data.html/#contractTypes=FUT&exchanges=XNYM&assetClassId=0), the table content is what I'm interested, however looks like the table is hidden at somewhere:

Right click the inspection on the table, I can get ==$0 (following by ) But at scrapy shell, if I do response.xpath('//*[@table]'), it returns nothing which means I can't scrape the content by this way.... Please help on this issue, thanks.

UPDATE: The final solution is by using Selenium (great tool) for this scrapy task, and selenium is especially useful when the web page content such as tables and etc. is java encrypted, there are tons of selenium instruction to be found in the community, here is one example.

score 0 · Accepted Answer · answered May 17 '18 at 01:43

0

The reason the table is empty is that you are trying to scrapy the wrong url that contains data of table, the correct is:

http://www.cmegroup.com/CmeWS/mvc/xsltTransformer.do?xlstDoc=/XSLT/md/blocks-records.xsl&url=/da/BlockTradeQuotes/V1/Block/BlockTrades?exchange=XCBT,XCME,XCEC,DUMX,XNYM&foi=FUT,OPT,SPD&assetClassId=0&tradeDate=05172018&sortCol=time&sortBy=desc

The "05172018" text on url above looks like a date filter with this format: MMDDYYYY.

answered May 17 '18 at 01:43

Laerte

240
1
10

Thanks a lot for it, did you find this URL through its nested server? I am really curious about it. I worked my but off and finally can scrape the url I originally provided through the Selenium method (the original one has its table with Java encryption I guess). – Gin May 17 '18 at 19:37
Thanks a lot man, definitely save my day. I am also curious if you are using Selenium as well? Before I saw your posted link, I can get the Scraping going by webdriver, however I encountered some Python scripting issue which I posted here:https://stackoverflow.com/questions/50415245/web-scrapy-with-selenium-error-while-obtaining-start-requests Would you mind also taking a look? Maybe I wrongfully used the Request clause – Gin May 18 '18 at 17:13
I will take a look, but from what i see you could use just scrapy you don't need use selenium in this case (I use selenium to crawl ASP.NET sites). – Laerte May 18 '18 at 17:33
1

Hi Laerte, I actually just resolved the selenium part and the question I asked in the post I just sent you was a bit trivial, so I delete that question post anyway. Thanks again for your help, I will eventually rewrite my code by the true table URL. – Gin May 18 '18 at 18:54

can not find table content (hidden table) when scrapy on a web

1 Answers1