how to find a continuous string using python

Question

Given a string (e.g., jaghiuuabc ), i want to find a string with subsequent letter in alphabet

here is my code

import string
alpha = list(string.ascii_lowercase)

s = 'jaghiuuabc'

a = []
for i in range(len(alpha)-1):
    for j in range(len(s)-1)
      if s[j] in alpha[i]:
         a.append(s[j])

print(a)

Do you mean that you want to find a part of the string where the letters are all next to each other in the alphabet? — APerson, Nov 26 '17 at 02:54
I think what OP wants is just to remove non-alphabetic characters from a string — Matias Cicero, Nov 26 '17 at 02:59
Do you mean you want the output for that data to be `'ghiabc'`? Or maybe `['ghi', 'abc']`? — PM 2Ring, Nov 26 '17 at 03:07

PM 2Ring · Accepted Answer · 2017-11-26T18:01:22.580

There's a nice example in the Python 2.6 itertools docs that shows how to find consecutive sequences. To quote:

Find runs of consecutive numbers using groupby. The key to the solution is differencing with a range so that consecutive numbers all appear in same group.

For some strange reason, that example is not in the later versions of the docs. That code works for sequences of numbers, the code below shows how to adapt it to work on letters.

from itertools import groupby

s = 'jaghiuuabc'

def keyfunc(t):
    ''' Subtract the character's index in the string 
        from its Unicode codepoint number. 
    ''' 
    i, c = t
    return ord(c) - i

a = []
for k, g in groupby(enumerate(s), key=keyfunc):
    # Extract the chars from the (index, char) tuples in the group
    seq = [t[1] for t in g]
    if len(seq) > 1:
        a.append(''.join(seq))

print(a)

output

['ghi', 'abc']

How it works

The heart of this code is

groupby(enumerate(s), key=keyfunc)

enumerate(s) generates tuples containing the index number and character for each character in s. For example:

s = 'ABCEF'
for t in enumerate(s):
    print(t)

output

(0, 'A')
(1, 'B')
(2, 'C')
(3, 'E')
(4, 'F')

groupby takes items from a sequence or iterator and gathers adjacent equal items together into groups. By default, it simply compares the values of the items to see if they're equal. But you can also give it a key function. When you do that, it passes each item to the key function and uses the result returned by that key function for its equality test.

Here's a simple example. First, we define a function div_by_10 that divides a number by 10, using integer division. This basically gets rid of the last digit in the number.

def div_by_10(n):
    return n // 10

a = [2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
b = [div_by_10(u) for u in a]
print(a)
print(b)

output

[2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
[0, 0, 1, 1, 1, 2, 2, 2, 3, 3]

So if we use div_by_10 as the key function to groupby it will ignore the last digit in each number and thus it will group adjacent numbers together if they only differ in the last digit.

from itertools import groupby

def div_by_10(n):
    return n // 10

a = [2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
print(a)
for key, group in groupby(a, key=div_by_10):
    print(key, list(group))

output

[2, 5, 10, 13, 17, 21, 22, 29, 33, 35]
0 [2, 5]
1 [10, 13, 17]
2 [21, 22, 29]
3 [33, 35]

My keyfunc receives a (index_number, character) tuple and subtracts that index_number from the character's code number and returns the result. Let's see what that does with my earlier example of 'ABCEF':

def keyfunc(t):
    i, c = t
    return ord(c) - i

for t in enumerate('ABCEF'):
    print(t, keyfunc(t))

output

(0, 'A') 65
(1, 'B') 65
(2, 'C') 65
(3, 'E') 66
(4, 'F') 66

The code number for 'A' is 65, the code number for 'B' is 66, the code number for 'C' is 67, etc. So when we subtract the index from the code number for each of 'A', 'B', and 'C' we get 65. But we skipped over 'D' so when we do the subtractions for 'E' and 'F' we get 66. And that's how groupby can put 'A', 'B', & 'C' in one group and 'E' & 'F' in the next group.

This can be tricky stuff. Don't expect to understand it all completely straight away. But if you do some experiments yourself I'm sure it will gradually sink in. ;)

Just for fun, here's the unreadable multiply-nested list comprehension version of that code. ;)

print([z for _, g in groupby(enumerate(s),lambda t:ord(t[1])-t[0])for z in[''.join([*zip(*g)][1])]if len(z)>1])

Here's another version which was inspired by Amit Tripathi's answer. This one doesn't use any imports because it does the grouping manually. prev contains the codepoint number of the previous character. We initialize prev to -2 so that the first time the if i != prev + 1 test is performed it's guaranteed to be true because the smallest possible value of ord(ch) is zero, so a new empty list will be added to groups.

s = 'jaghiuuabcxyzq'

prev, groups = -2, []
for ch in s:
    i = ord(ch)
    if i != prev + 1:
        groups.append([])
    groups[-1].append(ch)
    prev = i

print(groups)
a = [''.join(u) for u in groups if len(u) > 1]
print(a)

output

[['j'], ['a'], ['g', 'h', 'i'], ['u'], ['u'], ['a', 'b', 'c'], ['x', 'y', 'z'], ['q']]
['ghi', 'abc', 'xyz']

@JoydeepRoychowdhury I'll add some more explanation to my answer. I admit it might not be easy to understand my code if you aren't familiar with [`itertools.groupby`](https://docs.python.org/3/library/itertools.html#itertools.groupby); you also need to know what [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate) does. There are some good `groupby` examples [here](https://stackoverflow.com/questions/41411492/what-is-itertools-groupby-used-for). — PM 2Ring, Nov 26 '17 at 05:29
@JoydeepRoychowdhury Please see my updated answer. I hope it makes it a little easier to understand what's going on. — PM 2Ring, Nov 26 '17 at 06:20
now i know what enumerate and groupby does thanks once again — Joydeep Roychowdhury, Nov 26 '17 at 12:10
hi - @PM 2Ring, can you please suggest me similar type of question? — Joydeep Roychowdhury, Nov 26 '17 at 13:28
@PM2Ring thanks for the mention. I would have appreciated an edit to my answer instead of it though :) — Amit Tripathi, Nov 27 '17 at 17:58

Amit Tripathi · Answer 2 · 2017-11-27T18:02:32.227

1

This can be done easily with pure Python

Python 3(should work with Python 2 also) implementation. A simple 8 liner

s = 'jaghiuuabc'

prev, counter, dct = None, 0, dict()
for i in s:
    if prev is not None:
        if not chr(ord(prev) + 1) == i:
            counter += 1
    prev = i
    dct.setdefault(counter, []).append(prev)

[''.join(dct[d]) for d in dct if len(dct[d]) > 1]

Out[51]: ['ghi', 'abc']

ord converts char to equivalent ASCII number

chr converts a number to equivalent ASCII char

setdefault set default value as list if a key doesn't exists

edited Nov 27 '17 at 18:02

answered Nov 26 '17 at 07:25

Amit Tripathi

7,003
6
32
58

Not bad. :) I often use `dct.setdefault(key, []).append(value)` myself, but we don't really need a `dict` here because the keys are guaranteed to be in order, so we can just use a list. And by a slight change in the logic we can reduce the number of `if` tests to 1. Please see the end of my answer for a variation of your code. – PM 2Ring Nov 26 '17 at 18:05

Aaditya Ura · Answer 3 · 2017-11-27T17:54:02.327

What about some recursion without any external module ?

a='jaghiuuabc'


import string
alpha = list(string.ascii_lowercase)
def trech(string_1,chr_list,new_string):
    final_list=[]
    if not string_1:
        return 0
    else:

        for chunk in range(0,len(string_1),chr_list):
            for sub_chunk in range(2,len(string_1)+1):
                if string_1[chunk:chunk + sub_chunk] in ["".join(alpha[i:i + sub_chunk]) for i in range(0, len(alpha), 1)]:
                    final_list.append(string_1[chunk:chunk + sub_chunk])

    if final_list:
        print(final_list)

    return trech(string_1[1:],chr_list-1,new_string)

print(trech(a,len(a),alpha))

output:

['gh', 'ghi']
['hi']
['ab', 'abc']
['bc']
0

how to find a continuous string using python

3 Answers3

How it works