Hello I am trying to scrape www.allocine.fr for the latest movies
I made the following script:
# -*- coding: utf-8 -*-
import urllib
import re
page = ["?page=1", "?page=2", "?page=3"]
i=0
while i<len(page):
url = "http://www.allocine.fr/film/aucinema/" +page[i]
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<a class="no_underline" href="/film/fichefilm_gen_cfilm=[^.]*.html">\n(.+?)\n</a>'
pattern = re.compile(regex)
movie = re.findall(pattern,htmltext)
i+=1
movielist = '\n '.join(movie)
print movielist
The problem is that the first and last items in the list don't have a space in front of them... what I try to say is on the output the last item in the 1st list and the first item in the 2nd list are not delimited by a space.
It looks like this:
Something in 1st list
something2 in 1st list
something3 in 1st list
Otherthing in 2nd list
otherthing2 in 2nd list
otherthing3 in 2nd list
====
I want it to be like: something something something otherthing otherthing