-1

I'm working on web crawling using Python. I had issues while using Python version 3. So I wanted to know which version of Python is suitable for web crawling.

halfer
  • 19,824
  • 17
  • 99
  • 186
ash1234
  • 21
  • 3

2 Answers2

1

I think there is definitely a counter-argument to making the move to Python 2.7. There is no reason that I can think of, particularly for web crawling, that you would need to revert to Python 2.7.

BeautifulSoup 4 and lxml are both fully ported to Python 3.5.

urllib is fully functional in Python 3.5. You should be aware that there are differences in the implementation of urllib in Python 2.7 and Python 3.5.

However, I would suggest you use the Requests package instead of urllib. Here is a post highlighting some of their differences.

If you need to load pages that require javascript, Selenium also works in Python 3.5. Selenium can also support headless browsing (e.g., PhantomJS).

Also, here is an official post from Python that can help guide you to your decision.

Community
  • 1
  • 1
-2

If you do opt to install one of the standard Python distributions, make sure you have Python 2.7.3 or later, but do not use Python 3.0 or later; these versions are, of course, the cutting edge versions, but many of the packages we will be using do not yet have Python 3.X support, and until they do, 3.X is not that appealing. For a good discussion of what is and is not available in Python 3.X, see Choosing Python versions.

Got this of a website which introduces and discusses web-cralling for Python. I would suggest you take their advice. I have also experienced that Python 2.7.* is the best for application at the moment for using additional packages.

Anna Jeanine
  • 3,975
  • 10
  • 40
  • 74
  • If this answers your question please mark the question so that it can help others too! – Anna Jeanine Nov 16 '16 at 13:51
  • Anna your answer is well intentioned, but I'd (politely!) argue that now in 2017 and even in late 2016 when you posted, that paragraph of information at that site you link to is out of date, both in general and with regard to web scraping. All the packages the site lists (1-7 from numpy to ipython) are available in Python 3 and have been for some time. There are very occasionally some edge cases but in general holding back with 2.7 is no longer necessary – Neil Jul 07 '17 at 17:45