Beginner not sure how to join lists while scraping

Question

Hello I am trying to scrape www.allocine.fr for the latest movies

I made the following script:

# -*- coding: utf-8 -*-
import urllib
import re

page = ["?page=1", "?page=2", "?page=3"]

i=0
while i<len(page):
    url = "http://www.allocine.fr/film/aucinema/" +page[i]
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()

    regex = '<a class="no_underline" href="/film/fichefilm_gen_cfilm=[^.]*.html">\n(.+?)\n</a>'

    pattern = re.compile(regex)

    movie = re.findall(pattern,htmltext)
    i+=1
    movielist = '\n '.join(movie)

    print movielist

The problem is that the first and last items in the list don't have a space in front of them... what I try to say is on the output the last item in the 1st list and the first item in the 2nd list are not delimited by a space.

It looks like this:

Something in 1st list
 something2 in 1st list
 something3 in 1st list
Otherthing in 2nd list
 otherthing2 in 2nd list
 otherthing3 in 2nd list

====

I want it to be like: something something something otherthing otherthing

Also for web scrapping you might want to take a look at [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) — El Bert, Sep 02 '14 at 14:56

score 1 · Accepted Answer · answered Sep 02 '14 at 15:03

1

You could:

print the space before:

movielist = ' ' + '\n '.join(movie)

print the space for each item:

movielist = '\n'.join([' ' +i for i in movie])

Exemple:

>>> print '\n '.join(movie)
something
 something
 something
 otherthing
 otherthing
>>> print ' '+'\n '.join(movie)
 something
 something
 something
 otherthing
 otherthing
>>> print '\n'.join([' ' +i for i in movie])
 something
 something
 something
 otherthing
 otherthing

answered Sep 02 '14 at 15:03

valjeanval42

132
6

Awesome that's exactly what I needed ! Thank you so much, it didn't cross my mind to do it like that :D – Alex TheWebGroup Sep 02 '14 at 15:19

score 0 · Answer 2 · edited May 23 '17 at 12:29

0

if you just want the items to be listed side by side then change your print statement to something like print "foo" % bar,

Reference: python print end=' '

edited May 23 '17 at 12:29

Community

1
1

answered Sep 02 '14 at 15:02

notorious.no

4,919
3
20
34

Beginner not sure how to join lists while scraping

2 Answers2