593

How do I split a string every nth character?

'1234567890'   →   ['12', '34', '56', '78', '90']

For the same question with a list, see How do I split a list into equally-sized chunks?.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Brandon L Burnett
  • 5,947
  • 3
  • 14
  • 4

19 Answers19

779
>>> line = '1234567890'
>>> n = 2
>>> [line[i:i+n] for i in range(0, len(line), n)]
['12', '34', '56', '78', '90']
satomacoto
  • 11,349
  • 2
  • 16
  • 13
  • 1
    @TrevorRudolph It only does exactly what you tell it. The above answer is really only just a for loop but expressed pythonically. Also, if you need to remember a "simplistic" answer, there are at least hundreds of thousands of ways to remember them: starring the page on stackoverflow; copying and then pasting into an email; keeping a "helpful" file with stuff you want to remember; simply using a modern search engine whenever you need something; using bookmarks in (probably) every web browser; etc. – dylnmc Nov 02 '14 at 04:03
  • It is easier to understand but it has the downside that you must reference 'line' twice. – Damien Jan 05 '16 at 14:33
  • 4
    Great for breaking up long lines for printing, e.g. ``for i in range(0, len(string), n): print(string[i:i+n])`` – PatrickT Aug 06 '21 at 06:12
  • 1
    for any noobs like me who don't get list comprehensions, the following may be easier to understand, in place of the last line: `substrings = []` `for i in range(0, len(line), n): substring = line[i:i+n] substrings.append(substring)` – ArduinoBen May 15 '23 at 04:11
339

Just to be complete, you can do this with a regex:

>>> import re
>>> re.findall('..','1234567890')
['12', '34', '56', '78', '90']

For odd number of chars you can do this:

>>> import re
>>> re.findall('..?', '123456789')
['12', '34', '56', '78', '9']

You can also do the following, to simplify the regex for longer chunks:

>>> import re
>>> re.findall('.{1,2}', '123456789')
['12', '34', '56', '78', '9']

And you can use re.finditer if the string is long to generate chunk by chunk.

Georgy
  • 12,464
  • 7
  • 65
  • 73
the wolf
  • 34,510
  • 13
  • 53
  • 71
  • 15
    This is by far the best answer here and deserves to be on top. One could even write `'.'*n` to make it more clear. No joining, no zipping, no loops, no list comprehension; just find the next two characters next to each other, which is exactly how a human brain thinks about it. If Monty Python were still alive, he'd love this method! – SO_fix_the_vote_sorting_bug Dec 12 '18 at 01:27
  • 2
    This is the fastest method for reasonably long strings too: https://gitlab.com/snippets/1908857 – Ralph Bolton Oct 30 '19 at 16:03
  • 10
    This won't work if the string contains newlines. This needs `flags=re.S`. – Aran-Fey Nov 14 '19 at 17:17
  • 1
    Yeah this is not a good answer. Regexes have so many gotchas (as Aran-Fey found!) that you should use them *very sparingly*. You definitely don't need them here. They're only faster because they're implemented in C and Python is crazy slow. – Timmmm Mar 22 '22 at 15:17
  • This is fast but more_itertools.sliced seems more efficient. – FifthAxiom Jun 01 '22 at 04:42
286

There is already an inbuilt function in Python for this.

>>> from textwrap import wrap
>>> s = '1234567890'
>>> wrap(s, 2)
['12', '34', '56', '78', '90']

This is what the docstring for wrap says:

>>> help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
Diptangsu Goswami
  • 5,554
  • 3
  • 25
  • 36
  • 4
    print(wrap('12345678', 3)) splits the string into groups of 3 digits, but starts in front and not behind. Result: ['123', '456', '78'] – Atalanttore May 20 '19 at 19:20
  • 5
    It is interesting to learn about 'wrap' yet it is not doing exactly what was asked above. It is more oriented towards displaying text, rather than splitting a string to a fixed number of characters. – Oren Jun 05 '19 at 15:21
  • 14
    `wrap` may not return what is asked for if the string contains space. e.g. `wrap('0 1 2 3 4 5', 2)` returns `['0', '1', '2', '3', '4', '5']` (the elements are stripped) – satomacoto Jun 20 '19 at 09:22
  • 3
    This indeed answers the question, but what happens if there's spaces and you want them maintained in the split characters? wrap() removes spaces if they fall straight after a split group of characters – Iron Attorney Jul 05 '19 at 18:56
  • 2
    This works poorly if you want to split text with hyphens (the number you give as argument is actually the MAXIMUM number of characters, not exact one, and it breaks i.e. on hyphens and white spaces). – MrVocabulary Aug 06 '19 at 14:11
  • `wrap()` appears to be pretty slow (and much slower than say the regex solution): https://gitlab.com/snippets/1908857 – Ralph Bolton Oct 30 '19 at 16:01
  • 1
    you can use `drop_whitespace=False` and `break_on_hyphens=False` to prevent the issues stated by satomacoto and MrVocabulary. See the [full documentation](https://docs.python.org/3/library/textwrap.html#textwrap.TextWrapper) – bmurauer Mar 25 '21 at 08:40
  • 1
    @Atalanttore Just do the following: `".".join(wrap(str(12345678)[::-1], 3))[::-1]` and you end up with `12.345.678`. – Gilfoyle May 02 '22 at 07:43
  • This is so slow. more_itertools.sliced and re.findall are much faster. – FifthAxiom Jun 01 '22 at 04:38
100

Another common way of grouping elements into n-length groups:

>>> s = '1234567890'
>>> map(''.join, zip(*[iter(s)]*2))
['12', '34', '56', '78', '90']

This method comes straight from the docs for zip().

Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
  • 2
    In [19]: a = "hello world"; list( map( "".join, zip(*[iter(a)]*4) ) ) get the result ['hell', 'o wo']. – truease.com Apr 18 '13 at 15:54
  • 21
    If someone finds `zip(*[iter(s)]*2)` tricky to understand, read [How does `zip(*[iter(s)]*n)` work in Python?](http://stackoverflow.com/questions/2233204/how-does-zipitersn-work-in-python). – Grijesh Chauhan Jan 11 '14 at 14:49
  • 19
    This does not account for an odd number of chars, it'll simply drop those chars: `>>> map(''.join, zip(*[iter('01234567')]*5))` -> `['01234']` – Bjorn Sep 15 '14 at 19:39
  • 4
    To also handle odd number of chars just replace `zip()` with `itertools.zip_longest()`: `map(''.join, zip_longest(*[iter(s)]*2, fillvalue=''))` – Paulo Freitas Jun 08 '17 at 07:44
  • Also useful: docs for [`maps()`](https://docs.python.org/3/library/functions.html#map) – winklerrr Apr 23 '19 at 11:17
  • I hope I never find this in production. Incredibly difficult to read for something that should be rather simple – Neuron Dec 16 '22 at 12:23
77

I think this is shorter and more readable than the itertools version:

def split_by_n(seq, n):
    '''A generator to divide a sequence into chunks of n units.'''
    while seq:
        yield seq[:n]
        seq = seq[n:]

print(list(split_by_n('1234567890', 2)))
Diptangsu Goswami
  • 5,554
  • 3
  • 25
  • 36
Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
  • 8
    but not really efficient: when applied to strings: too many copies – Eric Aug 27 '15 at 21:17
  • 1
    It also doesn't work if seq is a generator, which is what the itertools version is _for_. Not that OP asked for that, but it's not fair to criticize itertool's version not being as simple. – mikenerone Jun 28 '17 at 20:47
41

Using more-itertools from PyPI:

>>> from more_itertools import sliced
>>> list(sliced('1234567890', 2))
['12', '34', '56', '78', '90']
Tim Diels
  • 3,246
  • 2
  • 19
  • 22
36

I like this solution:

s = '1234567890'
o = []
while s:
    o.append(s[:2])
    s = s[2:]
vlk
  • 2,581
  • 3
  • 31
  • 35
19

You could use the grouper() recipe from itertools:

Python 2.x:

from itertools import izip_longest    

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Python 3.x:

from itertools import zip_longest

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

These functions are memory-efficient and work with any iterables.

Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
16

This can be achieved by a simple for loop.

a = '1234567890a'
result = []

for i in range(0, len(a), 2):
    result.append(a[i : i + 2])
print(result)

The output looks like ['12', '34', '56', '78', '90', 'a']

Sunil Purushothaman
  • 8,435
  • 1
  • 22
  • 20
Kasem777
  • 737
  • 7
  • 10
  • 4
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – β.εηοιτ.βε May 22 '20 at 18:41
  • 4
    This is the same solution as here: https://stackoverflow.com/a/59091507/7851470 – Georgy May 22 '20 at 20:23
  • 1
    This is the same solution as the top voted answer - except for the fact that the top answer is using list comprehension. – Leonardus Chen Dec 07 '20 at 04:54
13

I was stuck in the same scenario.

This worked for me:

x = "1234567890"
n = 2
my_list = []
for i in range(0, len(x), n):
    my_list.append(x[i:i+n])
print(my_list)

Output:

['12', '34', '56', '78', '90']
Strick
  • 1,512
  • 9
  • 15
9

Try this:

s = '1234567890'
print([s[idx:idx+2] for idx in range(len(s)) if idx % 2 == 0])

Output:

['12', '34', '56', '78', '90']
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
  • why enumerate(s) if you're going to ignore the val? just do `for i in range(len(s))`; why iterate over every value only to throw away half of them? just skip the values you don't need: `for i in range(0, len(s), 2)` (and skip the `if` part) – Arthur Tacca Mar 28 '23 at 15:54
8

Try the following code:

from itertools import islice

def split_every(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield piece
        piece = list(islice(i, n))

s = '1234567890'
print list(split_every(2, list(s)))
enderskill
  • 7,354
  • 3
  • 24
  • 23
  • Your answer doesn't meet OP's requirement, you have to use `yield ''.join(piece)` to make it work as expected: https://eval.in/813878 – Paulo Freitas Jun 08 '17 at 08:15
6

As always, for those who love one liners:

n = 2  
line = "this is a line split into n characters"  
line = [line[i * n:i * n+n] for i, blah in enumerate(line[::n])]
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
Sqripter
  • 101
  • 2
  • 7
  • When I run this in Python Fiddle with a `print(line)` I get `this is a line split into n characters` as the output. Might you be better putting: `line = [line[i * n:i * n+n] for i,blah in enumerate(line[::n])]`? Fix this and it's a good answer :). – Peter David Carter May 20 '16 at 20:24
  • Can you explain the `,blah` and why it's necessary? I notice I can replace `blah` with any alpha character/s, but not numbers, and can't remove the `blah` or/and the comma. My editor suggests adding whitespace after `,` :s – toonarmycaptain Jul 17 '17 at 20:11
  • `enumerate` returns two iterables, so you need two places to put them. But you don't actually need the second iterable for anything in this case. – Daniel F Jul 27 '17 at 09:18
  • 1
    Rather than `blah` I prefer to use an underscore or double underscore, see: https://stackoverflow.com/questions/5893163/what-is-the-purpose-of-the-single-underscore-variable-in-python – Andy Royal Aug 15 '17 at 10:39
6
>>> from functools import reduce
>>> from operator import add
>>> from itertools import izip
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x)]
['12', '34', '56', '78', '90']
>>> x = iter('1234567890')
>>> [reduce(add, tup) for tup in izip(x, x, x)]
['123', '456', '789']
ben w
  • 2,490
  • 14
  • 19
3

more_itertools.sliced has been mentioned before. Here are four more options from the more_itertools library:

s = "1234567890"

["".join(c) for c in mit.grouper(2, s)]

["".join(c) for c in mit.chunked(s, 2)]

["".join(c) for c in mit.windowed(s, 2, step=2)]

["".join(c) for c in  mit.split_after(s, lambda x: int(x) % 2 == 0)]

Each of the latter options produce the following output:

['12', '34', '56', '78', '90']

Documentation for discussed options: grouper, chunked, windowed, split_after

pylang
  • 40,867
  • 14
  • 129
  • 121
3

A simple recursive solution for short string:

def split(s, n):
    if len(s) < n:
        return []
    else:
        return [s[:n]] + split(s[n:], n)

print(split('1234567890', 2))

Or in such a form:

def split(s, n):
    if len(s) < n:
        return []
    elif len(s) == n:
        return [s]
    else:
        return split(s[:n], n) + split(s[n:], n)

, which illustrates the typical divide and conquer pattern in recursive approach more explicitly (though practically it is not necessary to do it this way)

englealuze
  • 1,445
  • 12
  • 19
2

A solution with groupby:

from itertools import groupby, chain, repeat, cycle

text = "wwworldggggreattecchemggpwwwzaz"
n = 3
c = cycle(chain(repeat(0, n), repeat(1, n)))
res = ["".join(g) for _, g in groupby(text, lambda x: next(c))]
print(res)

Output:

['www', 'orl', 'dgg', 'ggr', 'eat', 'tec', 'che', 'mgg', 'pww', 'wza', 'z']
TigerTV.ru
  • 1,058
  • 2
  • 16
  • 34
0

These answers are all nice and working and all, but the syntax is so cryptic... Why not write a simple function?

def SplitEvery(string, length):
    if len(string) <= length: return [string]        
    sections = len(string) / length
    lines = []
    start = 0;
    for i in range(sections):
        line = string[start:start+length]
        lines.append(line)
        start += length
    return lines

And call it simply:

text = '1234567890'
lines = SplitEvery(text, 2)
print(lines)

# output: ['12', '34', '56', '78', '90']
Yosef Bernal
  • 1,006
  • 9
  • 20
  • 1
    You cannot pass a float to the range function, so the function you display wouldn't work. (Try running it if you don't believe me) – cd-CreepArghhh Oct 03 '22 at 10:15
0

Another solution using groupby and index//n as the key to group the letters:

from itertools import groupby

text = "abcdefghij"
n = 3

result = []
for idx, chunk in groupby(text, key=lambda x: x.index//n):
    result.append("".join(chunk))

# result = ['abc', 'def', 'ghi', 'j']
c_georges
  • 63
  • 6