How do I split a string every nth character?
'1234567890' → ['12', '34', '56', '78', '90']
For the same question with a list, see How do I split a list into equally-sized chunks?.
How do I split a string every nth character?
'1234567890' → ['12', '34', '56', '78', '90']
For the same question with a list, see How do I split a list into equally-sized chunks?.
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']
Just to be complete, you can do this with a regex:
>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']
For odd number of chars you can do this:
>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']
You can also do the following, to simplify the regex for longer chunks:
>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']
And you can use re.finditer
if the string is long to generate chunk by chunk.
There is already an inbuilt function in Python for this.
>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']
This is what the docstring for wrap
says:
>>> help(wrap)
'''
Help on function wrap in module textwrap:
wrap(text, width=70, **kwargs)
Wrap a single paragraph of text, returning a list of wrapped lines.
Reformat the single paragraph in 'text' so it fits in lines of no
more than 'width' columns, and return a list of wrapped lines. By
default, tabs in 'text' are expanded with string.expandtabs(), and
all other whitespace characters (including newline) are converted to
space. See TextWrapper class for available keyword args to customize
wrapping behaviour.
'''
Another common way of grouping elements into n-length groups:
>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']
This method comes straight from the docs for zip()
.
I think this is shorter and more readable than the itertools version:
def split_by_n(seq, n):
'''A generator to divide a sequence into chunks of n units.'''
while seq:
yield seq[:n]
seq = seq[n:]
print(list(split_by_n('1234567890', 2)))
Using more-itertools from PyPI:
>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']
I like this solution:
s = '1234567890'
o = []
while s:
o.append(s[:2])
s = s[2:]
You could use the grouper()
recipe from itertools
:
from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
from itertools import zip_longest
def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
"Collect data into non-overlapping fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
# grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
# grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
args = [iter(iterable)] * n
if incomplete == 'fill':
return zip_longest(*args, fillvalue=fillvalue)
if incomplete == 'strict':
return zip(*args, strict=True)
if incomplete == 'ignore':
return zip(*args)
else:
raise ValueError('Expected fill, strict, or ignore')
These functions are memory-efficient and work with any iterables.
This can be achieved by a simple for loop.
a = '1234567890a'
result = []
for i in range(0, len(a), 2):
result.append(a[i : i + 2])
print(result)
The output looks like ['12', '34', '56', '78', '90', 'a']
I was stuck in the same scenario.
This worked for me:
x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
my_list.append(x[i:i+n])
print(my_list)
Output:
['12', '34', '56', '78', '90']
Try this:
s = '1234567890'
print([s[idx:idx+2] for idx in range(len(s)) if idx % 2 == 0])
Output:
['12', '34', '56', '78', '90']
Try the following code:
from itertools import islice
def split_every(n, iterable):
i = iter(iterable)
piece = list(islice(i, n))
while piece:
yield piece
piece = list(islice(i, n))
s = '1234567890'
print list(split_every(2, list(s)))
As always, for those who love one liners:
n = 2
line = "this is a line split into n characters"
line = [line[i * n:i * n+n] for i, blah in enumerate(line[::n])]
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
more_itertools.sliced
has been mentioned before. Here are four more options from the more_itertools
library:
s = "1234567890"
["".join(c) for c in mit.grouper(2, s)]
["".join(c) for c in mit.chunked(s, 2)]
["".join(c) for c in mit.windowed(s, 2, step=2)]
["".join(c) for c in mit.split_after(s, lambda x: int(x) % 2 == 0)]
Each of the latter options produce the following output:
['12', '34', '56', '78', '90']
Documentation for discussed options: grouper
, chunked
, windowed
, split_after
A simple recursive solution for short string:
def split(s, n):
if len(s) < n:
return []
else:
return [s[:n]] + split(s[n:], n)
print(split('1234567890', 2))
Or in such a form:
def split(s, n):
if len(s) < n:
return []
elif len(s) == n:
return [s]
else:
return split(s[:n], n) + split(s[n:], n)
, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)
A solution with groupby
:
from itertools import groupby, chain, repeat, cycle
text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)
Output:
['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']
These answers are all nice and working and all, but the syntax is so cryptic... Why not write a simple function?
def SplitEvery(string, length):
if len(string) <= length: return [string]
sections = len(string) / length
lines = []
start = 0;
for i in range(sections):
line = string[start:start+length]
lines.append(line)
start += length
return lines
And call it simply:
text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)
# output: ['12', '34', '56', '78', '90']
Another solution using groupby
and index//n
as the key to group the letters:
from itertools import groupby
text = "abcdefghij"
n = 3
result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
result.append("".join(chunk))
# result = ['abc', 'def', 'ghi', 'j']