1

I am using this(https://github.com/thibauts/duckduckgo) module to scrape duckduckgo search results:

>>> import duckduckgo
>>> for links in duckduckgo.search('Yellow Chris Martin',max_results=20):
...     print links

In the output I am getting search results and there seems to be
repetition of 4 times of the same link

Output:

http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
http://www.youtube.com/watch?v=ZTEKsbLl64w
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
https://en.wikipedia.org/wiki/Yellow_(Coldplay_song)
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s
http://www.youtube.com/watch?v=1MwjX4dG72s

How to fix this and get same results as found when using the search engine.

cgkentrus
  • 111
  • 1
  • 1
  • 7
  • I know this question is a bit stale so, for the sake of posterity, please don’t scrape our results. We don’t have the rights to syndicate the links, and so consider this type of programmatic access as abuse. We have an open API available (documented here: https://duckduckgo.com/api) that you’re free to use. Thanks for understanding! – Jaryd Malbin Jun 19 '18 at 12:52
  • (A note about the previous comment for future readers: Jaryd Malbin posted it originally as an [anwer](https://stackoverflow.com/a/50928965), which was turned into a comment by a moderator. Jaryd appeared to be affiliated with DuckDuckGo, although it was not mentioned in his answer or SO profile at the time. To summarize: what the question is asking goes against the requirements of DuckDuckGo, and people are therefore encouraged to use the API that was linked-to in the comment.) – Dev-iL Jun 19 '18 at 14:30

1 Answers1

1

You could use convert the duckduckgo object to a list and then use set() :

count = 10
while( set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) ) < some_val ):
    count = count + 1

for links in set(list(duckduckgo.search('Yellow Chris Martin',max_results=count)) :
    print links
kaiser
  • 948
  • 8
  • 11