2

I have a function in a .py file to take a list of links (url strings) and a list of private paths for a link and remove it from the list and return a new list.

e.g. remove any items from the list that contain the string '/files'.

This are the lists:

private_paths = ['/sites/', '/files']
url_strings = ['http://example.com/files/image1.jpg', 'http://example.com/index.html', 'http://example.com/about.html', 'http://example.com/sites/js/example.js']

etc.. etc..

The function is below:

def rmvPrivate(privatepaths, links):

copy = list(links)

for link in copy:
    for path in privatepaths:
        if path in link:
            # printed link and path here
            copy.remove(link) 
return copy

Called with:

rmvPrivate(private_paths, url_strings)

The function is finding and matching links that are in the url_strings list that contain a private path from the private_paths list, but they are not being removed?

Thanks in advance for any advice you can give me!

Context: I'm trying to write a script that goes to the home page of a website gets all the links and adds them to an array - then this array will be used in python/selenium tests..

Thanks again!

acb1906
  • 31
  • 5

1 Answers1

0

You made a copy of the list. If you remove from the copy then the original is never changed.

Do this

private_paths = ['/sites/', '/files/']
url_strings = ['http://example.com/files/image1.jpg', 'http://example.com/index.html', 'http://example.com/about.html', 'http://example.com/sites/js/example.js']

def rmvPrivate(privatepaths, links):
    for link in links:
        for path in privatepaths:
            if path in link:
                # printed link and path here
                links.remove(link) 



rmvPrivate(private_paths, url_strings)

print url_strings

Note the return value (which you never capture) is redundant if you change the list in place.

Alternatively with your original code you could just capture the return value of the function.

public_url_strings = rmvPrivate(private_paths, url_strings)

As a one liner based on Alex Martellis answer in the linked dupe question.

def rmvPrivate(privatepaths, links):
    links[:] = [link for link in links if all(pp not in link for pp in private_paths)]
Paul Rooney
  • 20,879
  • 9
  • 40
  • 61