3

I have been following this tutorial to learn how to use Scrapy. I am using greenbook as my sample site to test out the web scraping. One of the function: SgmlLinkExtractor takes in a parameter which is the href of the "next" page button . The problem is that for greenbook , the href for the "next" page button is a "#" if you inspect the element via firefox

These are my questions

1) What does "#" mean when used in this way : href="#"

2) How do i solve this issue

Thanks

Darren
  • 68,902
  • 24
  • 138
  • 144
user2284926
  • 661
  • 2
  • 10
  • 20

2 Answers2

2

You can use # to point to an ID on the page rather than redirect to a URL.

When you see stuff like "Click here to scroll to the bottom of the page`

The here href will be #bottomOfPage

http://jsfiddle.net/2q3NJ/

Darren
  • 68,902
  • 24
  • 138
  • 144
1

The attribute href="#" means the same as href="", i.e. a reference to the start of the current document. It is seldom used with the intention of linking to the start, however. Instead, it is used a placeholder that makes the a element formally a link, and also a link from styling point of view, but in a context where the element is expected to have an onclick event handler or to have its href value overwritten.

Cf. to Is an empty href valid? and Which "href" value should I use for JavaScript links, "#" or "javascript:void(0)"?

In your case, it sounds like the software you are using generates next page “links” that are not real links but driven by JavaScript and carrying href="#" as a placeholder only. This does not work with other software that expects href attributes to be real. It depends on both pieces of software whether and how you can make them work together.

Community
  • 1
  • 1
Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390