Given a string such as "8584320342564023450211233923239923239110001012346596"
, how can get all consecutive 4 digit subsequences?
For example, the above string will produce: 8548
, 5843
, 8432
, 0342
, ...
I think this does what you want:
es = '8584320342564023450211233923239923239110001012346596'
strings = [es[x:x+4] for x in xrange(0, len(es)-3)]
Output
>> strings
Out[42]:
['8584',
'5843',
'8432',
'4320',
'3203',
'2034',
'0342',
'3425',
'4256',
'2564',
'5640',
'6402',
'4023',
'0234',
'2345',
'3450',
'4502',
'5021',
'0211',
'2112',
'1123',
'1233',
...
You can get a subsequence of n
characters starting from i
th position using this code :
str[i:i+n]
Be careful that 0th position is starting of the string not 1st. I mean str[0:0+n]
will give you the first n characters not str[1:1+n]
Here is the code:
s = "8584320342564023450211233923239923239110001012346596"
for i in range(len(s) - 3):
print(s[i:i+4])
data = "8584320342564023450211233923239923239110001012346596"
span = 4
for i in range(len(data) - span + 1):
print data[i:i+span]
Here is a regex solution:
import re
re.findall("[0-9]{4}","8584320342564023450211233923239923239110001012346596")
EDIT: Thanks to the comment I see what you actually wanted was all overlapping matches. I found an existing stackoverflow answer to that here: Python regex find all overlapping matches?
Using that as a hint to the regular expression needed. In your case you can use:
>>> re.findall("(?=(\d{4}))","8584320342564023450211233923239923239110001012346596")
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']
>>> from itertools import islice
>>> line = "8584320342564023450211233923239923239110001012346596"
>>> map(''.join, zip(*(islice(line,i,None) for i in range(4))))
['8584', '5843', '8432', '4320', '3203', '2034', '0342', '3425', '4256', '2564', '5640', '6402', '4023', '0234', '2345', '3450', '4502', '5021', '0211', '2112', '1123', '1233', '2339', '3392', '3923', '9232', '2323', '3239', '2399', '3992', '9923', '9232', '2323', '3239', '2391', '3911', '9110', '1100', '1000', '0001', '0010', '0101', '1012', '0123', '1234', '2346', '3465', '4659', '6596']
This will set output to contain a list of lists with a leaf for each 4 character sequence:
output = []
for i in range(len(input) - 3):
output.append(input[i:i+4])