How to find repeating strings one after the other?

Question

I recently started studying Python on my own, and I have not been able to answer this question:

"Write a function that receives a list of names (as strings). The function returns a list of all the names that appear twice in a row in the list, without duplicates. For example, for the list: ["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"] The function will return a list of Avi and Shlomo. The function must work in O (n)."

This is what I have written so far, but I have not succeeded:

def double_names(lst):
    new_lst = []
    for i in lst:
        if lst[i] == lst[i+1]:
            new_lst.append(i)
    return new_lst

print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))

Please don't post images of code, but paste the code directly inside the question (as a code block). — L3viathan, Sep 22 '20 at 12:03
Does this answer your question? [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) — Jakob Guldberg Aaes, Sep 22 '20 at 12:05

L3viathan · Answer 1 · 2020-09-22T12:48:06.583

2

Your attempt fails because:

There's not always a list element i+1 (you could use zip(lst, lst[1:]) instead).
You are not accounting for the fact that in your result no names should appear twice. You could use a set for this.
You're iterating over a list while expecting to get indices (you'll actually get list items). (Thanks, Paul)

Something like this should work:

def double_names(lst):
    new_set = set()
    for first, second in zip(lst, lst[1:]):
        if first == second:
            new_set.add(first)
    return list(new_set)

edited Sep 22 '20 at 12:48

answered Sep 22 '20 at 12:06

L3viathan

26,748
2
58
81

This might mess up the order, though. Better use both a set and list for duplicates and order. – tobias_k Sep 22 '20 at 12:08
@tobias_k I'm pretty sure sets nowadays preserve their insertion order (in CPython 3.6+), but you might be right in that it might not be guaranteed. – L3viathan Sep 22 '20 at 12:12
Helped me a lot, thank you very much! I liked the way you used the set to download duplicates:) – יוסי כהן Sep 22 '20 at 12:18
1

The OP's approach also won't work because they are iterating over the items of the list, but then trying to treat the current item as an index. – Paul M. Sep 22 '20 at 12:28
You did not have to write - "for first, second in zip(lst[0], lst[1:]):" ? – יוסי כהן Sep 22 '20 at 12:40

DDD1 · Answer 2 · 2020-09-22T12:19:50.303

1

I think the easiest way is by saving the first element of the list. Then you start iterating over the list from first position on and check if the current element is the same as the last element you checked. If it is, then you have a match, if not then you just update the last element and keep going.

def doublenames(l):
    last = l[0]
    new_list = []
    for element in l[1:]:
        if element == last:
            if element not in new_list:
                new_list.append(element)
        else:
            last = element
    return new_list

Hope it helped :)

(You can also do it in one line:

[x[0] for x in set(zip(l, l[1:])) if (x[0] == x[1])]

)

edited Sep 22 '20 at 12:19

answered Sep 22 '20 at 12:08

DDD1

361
1
11

This answer is also not in O(n). – L3viathan Sep 22 '20 at 12:19
I have a question, in the solution you did, I did not understand how you did that the software does not go through all the strings and looks for a match for the first string, but only checks one string after it? By the code (it's true), it looks like you're going through all the strings and looking for a match for the first string – יוסי כהן Sep 22 '20 at 12:36
If you look at the code, I save the first value in the variable "last" and the iterate starting at l[1:] (which is the second position of the list). This is because there's no real need to start at the beginning, as the first comparison will be position 0 and position 1 (first and second positions). The variable "last" gets updated every time a new word appears (in the else clause), so it keeps track of the last element you checked. – DDD1 Sep 22 '20 at 17:04

Omri · Answer 3 · 2020-09-22T12:12:20.913

1

keep a variable previous_name which will store the name from the previous iteration.

def double_names(names_list):
    double_names = []
    previous_name = None
    for n in names_list:
        if previous_name is not None:
            if n == previous_name:
                if n not in double_names:
                    double_names.append(n)
        previous_name = n
    return double_names

edited Sep 22 '20 at 12:12

answered Sep 22 '20 at 12:08

Omri

483
4
12

This does not run in O(n), because the search of `double_names` is also in O(n). Therefore, the entirety should be in O(n^2). – L3viathan Sep 22 '20 at 12:09
Well I guess you can use a set, which will be more efficient, but I'm not sure about complexity. – Omri Sep 22 '20 at 12:11

score 0 · Answer 4 · answered Sep 22 '20 at 12:18

Another solution:

lst = ["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]

def doublenames(L):
    out = {}
    for v in L:
        out.setdefault(v, []).append(v)

    return [v[0] for v in out.values() if len(v) > 1]

print(doublenames(lst))

Prints:

['avi', 'shlomo']

score 0 · Answer 5 · answered Sep 22 '20 at 12:33

If the order of the names doesn't matter, you can use a set to remove duplicates

def double_names(names):
    from itertools import groupby
    return list({key for key, group in groupby(names) if len(tuple(group)) >= 2})

print(double_names(["avi", "avi", "beni", "shlomo", "shlomo", "David", "haim", "moshe", "shlomo", "shlomo"]))

Output:

['shlomo', 'avi']
>>>

Just tried this with Python 3.8.2, and it seems the order is not preserved.

How to find repeating strings one after the other?

5 Answers5