newbie in scrapy : how to response.css scrape the text part?

Question

When I practice, I want to catch only the text part (1,2,3,4,5...), without the part how can I write the response.css("td[class='c1']")?

scrapy shell "https://tw.movies.yahoo.com/chart.html"
response.css("td[class='c1']")

enter image description here

score 5 · Accepted Answer · answered Jul 22 '14 at 06:51

5

Here are two options, one using css(), another one using xpath():

>>> response.css("td.c1 > span::text").extract()
[u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'12', u'13', u'14', u'15', u'16', u'17', u'18', u'19', u'20']
>>> response.xpath("//td[@class='c1']/span/text()").extract()
[u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10', u'11', u'12', u'13', u'14', u'15', u'16', u'17', u'18', u'19', u'20']

answered Jul 22 '14 at 06:51

alecxe

462,703
120
1,088
1,195

Wow it's amazing!!But is there a way to remove the 'u'? – user2492364 Jul 22 '14 at 06:55
1

@user2492364 it is just a [unicode literal](http://stackoverflow.com/questions/2464959/whats-the-u-prefix-in-a-python-string), don't worry about it. – alecxe Jul 22 '14 at 06:56
you can remove the u by doing after all that .encode('utf8') – Eefret Sep 09 '15 at 20:10
In your code - `response.css("td[class='c1']")`, you can add `::text` before the quotes: response.css("td[class='c1']::text") – Aakash Saxena Dec 26 '18 at 18:26

newbie in scrapy : how to response.css scrape the text part?

1 Answers1