1

Hello I am trying to scrape www.allocine.fr for the latest movies

I made the following script:

# -*- coding: utf-8 -*-
import urllib
import re

page = ["?page=1", "?page=2", "?page=3"]

i=0
while i<len(page):
    url = "http://www.allocine.fr/film/aucinema/" +page[i]
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()

    regex = '<a class="no_underline" href="/film/fichefilm_gen_cfilm=[^.]*.html">\n(.+?)\n</a>'

    pattern = re.compile(regex)

    movie = re.findall(pattern,htmltext)
    i+=1
    movielist = '\n '.join(movie)

    print movielist

The problem is that the first and last items in the list don't have a space in front of them... what I try to say is on the output the last item in the 1st list and the first item in the 2nd list are not delimited by a space.

It looks like this:

Something in 1st list
 something2 in 1st list
 something3 in 1st list
Otherthing in 2nd list
 otherthing2 in 2nd list
 otherthing3 in 2nd list

====

I want it to be like: something something something otherthing otherthing

Alex TheWebGroup
  • 175
  • 2
  • 12

2 Answers2

1

You could:

print the space before:

movielist = ' ' + '\n '.join(movie)

print the space for each item:

movielist = '\n'.join([' ' +i for i in movie])

Exemple:

>>> print '\n '.join(movie)
something
 something
 something
 otherthing
 otherthing
>>> print ' '+'\n '.join(movie)
 something
 something
 something
 otherthing
 otherthing
>>> print '\n'.join([' ' +i for i in movie])
 something
 something
 something
 otherthing
 otherthing
valjeanval42
  • 132
  • 6
0

if you just want the items to be listed side by side then change your print statement to something like print "foo" % bar,

Reference: python print end=' '

Community
  • 1
  • 1
notorious.no
  • 4,919
  • 3
  • 20
  • 34