Python: how the get the consecutive n-size pieces in a string?

Question

Given a string such as "8584320342564023450211233923239923239110001012346596", how can get all consecutive 4 digit subsequences?

For example, the above string will produce: 8548, 5843, 8432, 0342, ...

score 6 · Answer 1 · answered Apr 19 '12 at 15:51

I think this does what you want:

es = '8584320342564023450211233923239923239110001012346596'
strings = [es[x:x+4] for x in xrange(0, len(es)-3)]

Output

>> strings
Out[42]: 
['8584',
 '5843',
 '8432',
 '4320',
 '3203',
 '2034',
 '0342',
 '3425',
 '4256',
 '2564',
 '5640',
 '6402',
 '4023',
 '0234',
 '2345',
 '3450',
 '4502',
 '5021',
 '0211',
 '2112',
 '1123',
 '1233',

  ...

laltin · Answer 2 · 2012-04-19T15:57:28.193

2

You can get a subsequence of n characters starting from i^th position using this code :

str[i:i+n]

Be careful that 0^th position is starting of the string not 1^st. I mean str[0:0+n] will give you the first n characters not str[1:1+n]

Here is the code:

s =  "8584320342564023450211233923239923239110001012346596"

for i in range(len(s) - 3):
    print(s[i:i+4])

edited Apr 19 '12 at 15:57

answered Apr 19 '12 at 15:50

laltin

1,134
12
20

score 2 · Answer 3 · answered Apr 19 '12 at 15:50

2

data = "8584320342564023450211233923239923239110001012346596"
span = 4
for i in range(len(data) - span + 1):
    print data[i:i+span]

answered Apr 19 '12 at 15:50

RichieHindle

272,464
47
358
399

score 1 · Accepted Answer · edited May 23 '17 at 10:34

Here is a regex solution:

import re
re.findall("[0-9]{4}","8584320342564023450211233923239923239110001012346596")

EDIT: Thanks to the comment I see what you actually wanted was all overlapping matches. I found an existing stackoverflow answer to that here: Python regex find all overlapping matches?

Using that as a hint to the regular expression needed. In your case you can use:

>>> re.findall("(?=(\d{4}))","8584320342564023450211233923239923239110001012346596")
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']

Your regular expression groups every set of 4, OP's question wants each group of 4 with each character in the string. Your code would do something like "8584", "3202"..., whereas the question wanted "8584", "5843"... — wkl, Apr 19 '12 at 16:03

score 1 · Answer 5 · answered Apr 20 '12 at 04:47

>>> from itertools import islice
>>> line = "8584320342564023450211233923239923239110001012346596"
>>> map(''.join, zip(*(islice(line,i,None) for i in range(4))))
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']

Silas Ray · Answer 6 · 2012-07-17T18:16:40.660

0

This will set output to contain a list of lists with a leaf for each 4 character sequence:

output = []
for i in range(len(input) - 3):
    output.append(input[i:i+4])

edited Jul 17 '12 at 18:16

answered Apr 19 '12 at 15:50

Silas Ray

25,682
5
48
63

Python: how the get the consecutive n-size pieces in a string?

6 Answers6