Removing entire string duplicates from a list

Question

I am running into an issue when trying to removing duplicates from a list.

def my_list_bucket(self, bucketName,  limit=sys.maxsize): #delimiter='/'):
    a_bucket = self.storage_client.lookup_bucket(bucketName)
    bucket_iterator = a_bucket.list_blobs()
        for resource in bucket_iterator:
            path_parts = resource.name.split('/')
            date_folder = path_parts[0]
            publisher_folder = path_parts[1]
            desired_path = date_folder + '/' + publisher_folder + '/'
            new_list = []
            for path in desired_path:
                if desired_path not in new_list:
                    new_list.append(desired_path)
            print(new_list)
            limit = limit - 1
            if limit <= `0:
                break

This is the results I get: 20230130/adelphic/
20230130/adelphic/
20230130/adelphic/
20230130/adelphic/
20230130/instacart/
20230130/instacart/
20230130/instacart/
20230130/instacart/

Its not removing the duplicates from the list as the duplicates are still there.

The results I want is:
20230130/adelphic/
20230130/instacart/

I have tried new_list = list(set(publisher_folder)) and it returns:
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'
'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l'

Does this answer your question? [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — JonSG, Mar 07 '23 at 20:24
I have tried new_list = list(set(publisher_folder)) and it returns: 'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l' 'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l' 'i', 'p', 'a', 'c', 'd', 'h', 'e', 'l' — DarrenC, Mar 08 '23 at 14:52

score 0 · Accepted Answer · answered Mar 08 '23 at 15:15

When you do:

for path in desired_path:`

it is essentially:

for character in desired_path:

at the moment since desired_path is a string that looks like "20230130/adelphic/".

At the moment your code breaks these strings into characters and reassemble them back into their original strings to print.

I assume what you seek is a list of distinct such strings and that might be done by:

import sys

def my_list_bucket(self, bucketName, limit=sys.maxsize): #delimiter='/'):
    a_bucket = self.storage_client.lookup_bucket(bucketName)
    new_list = set()
    for resource in a_bucket.list_blobs():
        new_list.add(f"{ '/'.join(resource.name.split('/')[:2]) }/")
        limit -= 1
        if not limit:
            break
    new_list = list(new_list)
    print(new_list)

or potentially:

def my_list_bucket(self, bucketName, limit=sys.maxsize): #delimiter='/'):
    a_bucket = self.storage_client.lookup_bucket(bucketName)
    new_list = list(set(
        f"{ '/'.join(resource.name.split('/')[:2]) }/"
        for resource in a_bucket.list_blobs()[:limit]
    ))
    print(new_list)

That works, tysm! I should've been clearer on wanting a distinct list of strings. Thanks for the help! — DarrenC, Mar 08 '23 at 15:37

Removing entire string duplicates from a list

1 Answers1