How does one compare a string to the next string in a list?

Question

I'm writing a small NLP algorithm and I need to do the following:

For every string x in the list ["this", "this", "and", "that"], if the string x and the next string are identical, I want to print the string.

GWW · Answer 1 · 2011-07-14T17:07:16.773

6

s = ["this", "this", "and", "that"]
for i in xrange(1,len(s)):
    if s[i] == s[i-1]:
        print s[i]

EDIT:

Just as a side note, if you are using python 3.X use range instead of xrange

edited Jul 14 '11 at 17:07

answered Jul 14 '11 at 17:01

GWW

43,129
11
115
108

score 5 · Answer 2 · answered Jul 14 '11 at 17:01

5

strings = ['this', 'this', 'and', 'that']
for a, b in zip(strings, strings[1:]):
    if a == b:
        print a

answered Jul 14 '11 at 17:01

FogleBird

74,300
25
125
131

This copies the list (well, except its first item) needlessly though. – Jul 14 '11 at 17:04
I'm not even sure if its more readable/elegant than a simple loop through all elements ... – MartinStettner Jul 14 '11 at 17:07
@FogleBird: Completely agreed - *for microoptimizations*. For things like these, (read: non-constant overhead), I'm more willing to think about it up-front. If OP is doing this with thirty-item lists, it's irrelevant. But if this is done on very long lists, it may become significant enough to warrant using a nearly equally simple and readable approach that avoids that overhead. – Jul 14 '11 at 17:10
4

Should you need to iterate over a huge list (bigger than RAM), you can use `izip()` instead of `zip()` and `islice(strings, 1, None)` instead of `strings[1:]`, all from `itertools`. – 9000 Jul 14 '11 at 17:14

RichieHindle · Answer 3 · 2012-03-26T07:48:44.247

2

TEST = ["this", "this", "and", "that"]
for i, s in enumerate(TEST):
   if i > 0 and TEST[i-1] == s:
      print s

# Prints "this"

edited Mar 26 '12 at 07:48

answered Jul 14 '11 at 17:02

RichieHindle

272,464
47
358
399

Andrew Jaffe · Answer 4 · 2011-07-14T20:23:44.660

2

Most Pythonic is a list comprehension, which is exactly built for looping and testing at the same time:

>>> strings = ['this', 'this', 'and', 'that']

>>> [a for (a,b) in zip(strings, strings[1:]) if a==b]

['this']

Or, to avoid temporary objects (h/t @9000):

>>> import itertools as it
>>> [a for (a,b) in it.izip(strings, it.islice(strings,1)) if a==b]

['this']

edited Jul 14 '11 at 20:23

answered Jul 14 '11 at 17:05

Andrew Jaffe

26,554
4
50
59

score 2 · Answer 5 · answered Jul 14 '11 at 17:05

Sometimes, I like to stick with old-fashioned loops:

strings = ['this', 'this', 'and', 'that']
for i in range(0, len(strings)-1):
   if strings[i] == strings[i+1]:
      print strings[i]

Everyone knows what's going on without much thinking, and it's fairly efficient...

score 1 · Answer 6 · answered Jul 14 '11 at 20:35

1

why not simply ? :

strings = ['this', 'this', 'and', 'that', 'or', 'or', 12,15,15,15, 'end']

a = strings[0]
for x in strings:
    if x==a:
        print x
    else:
        a = x

answered Jul 14 '11 at 20:35

eyquem

26,771
7
38
46

This will always print the firth element in `strings` since a and x both start as strings[0]. – istruble Jan 21 '12 at 22:49

score 0 · Answer 7 · answered Jul 14 '11 at 17:06

0

Is that homework?

l = ["this", "this", "and", "that", "foo", "bar", "bar", "baz"]

for i in xrange(len(l)-1):
   try:
      if l.index(l[i], i+1) == i+1:
         print l[i]
   except ValueError:
      pass

answered Jul 14 '11 at 17:06

BjoernD

4,720
27
32

I really don't see why you use the try/except statement ?? I will simply use a print str(l[i]) and it's gonna be ok :) – ykatchou Jul 14 '11 at 22:24
list.index() throws a ValueError exception if the item is not found. That's why. – BjoernD Jul 14 '11 at 23:10
the only way it could happen is if you delete an item between the range and the print ? :/ – ykatchou Jul 15 '11 at 07:56
As the documentation for list.index() says: "Return the index in the list of the first item whose value is x. It is an error if there is no such item." – BjoernD Jul 15 '11 at 20:19

score 0 · Answer 8 · answered Jul 14 '11 at 17:27

Generally speaking, if you're processing over items in a list and you need to look at the current item's neighbors, you're going to want to use enumerate, since enumerate gives you both the current item and its position in the list.

Unlike the approaches that use zip, this list comprehension requires no duplication of the list:

print [s for i, s in enumerate(test[:-1]) if s == test[i + 1]]

Note that it fails if there aren't at least two elements in test, and that test must be a list. (The zip approaches will work on any iterable.)

score 0 · Answer 9 · answered Jul 14 '11 at 17:42

Here's a little different approach that uses a special class to detect repeats in a sequence. Then you can actually find the repeats using a simple list comprehension.

class repeat_detector(object):
    def __init__(self, initial=None):
        self.last = initial
    def __call__(self, current):
        if self.last == current:
            return True
        self.last = current
        return False

strings = ["this", "this", "and", "that"]

is_repeat = repeat_detector()

repeats = [item for item in strings if is_repeat(item)]

score 0 · Answer 10 · answered Jul 14 '11 at 19:55

Use the recipe for pairwise() from the stdlib itertools documentation (I'll quote it here):

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

And you can do:

for a, b in pairwise(L):
    if a == b:
        print a

Or with a generator expression thrown in:

for i in (a for a, b in pairwise(L) if a==b):
    print i

How does one compare a string to the next string in a list?

10 Answers10