1

I tried following this post but, it doesnt seem to be working for me.

I tried this code:

for bresult in response.css(LIST_SELECTOR):
    NAME_SELECTOR = 'h2 a ::attr(href)'
    yield {
        'name': bresult.css(NAME_SELECTOR).extract_first(),
    }
                                                                                  b_result_list.append(bresult.css(NAME_SELECTOR).extract_first())

    #set b_result_list to SET to remove dups, then change back to LIST
    set(b_result_list)
    list(set(b_result_list))
for brl in b_result_list:
    print("brl: {}".format(brl))

This prints out:

brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users
brl: https://facebook.site.com/users/login

When I just need:

brl: https://facebook.site.com/users/login
brl: https://facebook.site.com/users

What am I doing wrong here?

Thank you!

Community
  • 1
  • 1
Jshee
  • 2,620
  • 6
  • 44
  • 60

2 Answers2

7

you are discarding the result when you need to save it ... b_result_list never actually changes... so you are just iterating over the original list. instead save the result of the set operation

b_result_list = list(set(b_result_list))

(note that sets do not preserve order)

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
1

If you want to maintain order and uniqueify, you can do:

>>> li
['1', '1', '2', '2', '3', '3', '3', '3', '1', '1', '4', '5', '4', '6', '6']
>>> seen=set()
>>> [e for e in li if not (e in seen or seen.add(e))]
['1', '2', '3', '4', '5', '6']

Or, you can use the keys of an OrderedDict:

>>> from collections import OrderedDict
>>> OrderedDict([(k, None) for k in li]).keys()
['1', '2', '3', '4', '5', '6']

But a set alone may substantially change the order of the original list:

>>> list(set(li))
['1', '3', '2', '5', '4', '6']
dawg
  • 98,345
  • 23
  • 131
  • 206