Python findall converion to array

Question

I need to use FINDALL to grab all the specific webpages and pass them into an array but just the links without quotes this is what i have so far if not an array a variable i can pass into each individual link in a loop that i can use them one by one or all at once

#!/usr/bin/env python
import re,urllib,urllib2

Url = "http://www.ihiphopmusic.com/music"
print Url
print 'test .............'
req = urllib2.Request(Url)
print "1"
response = urllib2.urlopen(req)
print "2"
#reads the webpage
the_webpage = response.read()
#grabs the title
the_list = re.findall(r'number-link" href="(.*?)#comments">0</a>',the_webpage)
print "3"
the_list = the_list.split(',')
arrlist = array('c',the_list)
print arrlist

Results

http://www.ihiphopmusic.com/music
test .............
1
2
3
Traceback (most recent call last):
  File "grub.py", line 17, in <module>
    the_list = the_list.split(',')
AttributeError: 'list' object has no attribute 'split'

You'll wake up Zalgo like this... http://stackoverflow.com/a/1732454/53936 — JosefAssad, Aug 15 '12 at 16:30
Don't parse html with regex. Use the lxml or BeautifulSoup libraries, which can accomplish what you want extremely easily. — Lanaru, Aug 15 '12 at 16:31

score 0 · Answer 1 · answered Aug 15 '12 at 16:26

re.findall returns a list of non-overlapping matches. You're trying to split the list which is why you're getting an AttributeError (list objects have no split method). I'm not exactly sure what you're trying to accomplish by that. Do you want to split the individual matches and store those in an iterable? If so, you could do something like:

import itertools
results = itertools.chain(*[x.split(',') for x in the_list])

score 0 · Answer 2 · answered Aug 15 '12 at 16:32

From what I can gather (correct me if I'm wrong), you're already there :) As @mgilson points out, it is already a list:

#grabs the title
the_list = re.findall(r'number-link" href="(.*?)#comments">0</a>',the_webpage)
print "3"
print type(the_list)
print the_list

So you can just iterate through that to do what you want:

for item in the_list:
    print item

score 0 · Answer 3 · answered Aug 15 '12 at 16:37

'split' is an attribute of a string object, not a list object. The AttributeError arises from trying to use split on a list. If you print the_list, you will see that it is already a list. If you want to split the list and display each URL on a separate line, you can use print '\n'.join(the_list).

Python findall converion to array

3 Answers3