0

Given a string such as "8584320342564023450211233923239923239110001012346596", how can get all consecutive 4 digit subsequences?

For example, the above string will produce: 8548, 5843, 8432, 0342, ...

jamylak
  • 128,818
  • 30
  • 231
  • 230
Sajee
  • 4,317
  • 14
  • 46
  • 54

6 Answers6

6

I think this does what you want:

es = '8584320342564023450211233923239923239110001012346596'
strings = [es[x:x+4] for x in xrange(0, len(es)-3)]

Output

>> strings
Out[42]: 
['8584',
 '5843',
 '8432',
 '4320',
 '3203',
 '2034',
 '0342',
 '3425',
 '4256',
 '2564',
 '5640',
 '6402',
 '4023',
 '0234',
 '2345',
 '3450',
 '4502',
 '5021',
 '0211',
 '2112',
 '1123',
 '1233',

  ...
wkl
  • 77,184
  • 16
  • 165
  • 176
2

You can get a subsequence of n characters starting from ith position using this code :

str[i:i+n]

Be careful that 0th position is starting of the string not 1st. I mean str[0:0+n] will give you the first n characters not str[1:1+n]

Here is the code:

s =  "8584320342564023450211233923239923239110001012346596"

for i in range(len(s) - 3):
    print(s[i:i+4])
laltin
  • 1,134
  • 12
  • 20
2
data = "8584320342564023450211233923239923239110001012346596"
span = 4
for i in range(len(data) - span + 1):
    print data[i:i+span]
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
1

Here is a regex solution:

import re
re.findall("[0-9]{4}","8584320342564023450211233923239923239110001012346596")

EDIT: Thanks to the comment I see what you actually wanted was all overlapping matches. I found an existing stackoverflow answer to that here: Python regex find all overlapping matches?

Using that as a hint to the regular expression needed. In your case you can use:

>>> re.findall("(?=(\d{4}))","8584320342564023450211233923239923239110001012346596")
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']
Community
  • 1
  • 1
Benedict
  • 2,771
  • 20
  • 21
  • Your regular expression groups every set of 4, OP's question wants each group of 4 with each character in the string. Your code would do something like "8584", "3202"..., whereas the question wanted "8584", "5843"... – wkl Apr 19 '12 at 16:03
1
>>> from itertools import islice
>>> line = "8584320342564023450211233923239923239110001012346596"
>>> map(''.join, zip(*(islice(line,i,None) for i in range(4))))
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']
jamylak
  • 128,818
  • 30
  • 231
  • 230
0

This will set output to contain a list of lists with a leaf for each 4 character sequence:

output = []
for i in range(len(input) - 3):
    output.append(input[i:i+4])
Silas Ray
  • 25,682
  • 5
  • 48
  • 63