11

I have a string, which I need to split into 2-letter pieces. For example, 'ABCDXY' should become ['AB', 'CD', 'XY']. The behavior in the case of odd number of characters may be entirely arbitrary (I'll check the length in advance).

Is there any way to do this without an ugly loop?

max
  • 49,282
  • 56
  • 208
  • 355
  • 1
    Hey mate... what do you mean by "ugly loop"? XD – Littm Sep 19 '12 at 09:08
  • 1
    I didn't think about it. But now that I did... An ugly loop is a loop that is uglier than necessary, or that is present when no loop is really required :) – max Sep 19 '12 at 09:10
  • related: [How do you split a list into evenly sized chunks in Python?](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python) – jfs Sep 19 '12 at 09:13

6 Answers6

26
>>> [s[i:i + 2] for i in range(0, len(s), 2)]
['AB', 'CD', 'XY']
Emmanuel
  • 13,935
  • 12
  • 50
  • 72
16

Using regular expressions!

>>> import re
>>> s = "ABCDXYv"
>>> re.findall(r'.{1,2}',s,re.DOTALL)
['AB', 'CD', 'XY', 'v']

I know it has been a while, but I came back to this and was curious about which method was better; mine: r'.{1,2}' or Jon's r'..?'. On the surface, Jon's looks much nicer, and I thought it would be much faster than mine, but I was surprised to find otherwise, so I thought I would share:

>>> import timeit
>>> timeit.Timer("re.findall(r'.{1,2}', 'ABCDXYv')", setup='import re').repeat()
[1.9064299485802252, 1.8369554649334674, 1.8548105833383772]
>>> timeit.Timer("re.findall(r'..?', 'ABCDXYv')", setup='import re').repeat()
[1.9142223469651611, 1.8670038395145383, 1.85781945659771]

Which shows that indeed r'.{1,2}' is the better/faster choice. (But only slightly)

Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
2

You could try:

s = 'ABCDEFG'
r = [s[i:i+2] for i in xrange(0, len(s), 2)]

# r is ['AB', 'CD', 'EF', 'G']

UPDATE 2

If you don't care about odd chars, you could use a regex (avoiding the loop):

s = 'ABCDEFG'
r = re.compile('(..)').findall(s)
# r is ['AB', 'CD', 'EF']
pr0gg3d
  • 977
  • 5
  • 7
1

There's nothing ugly about the perfectly Pythonic:

string = 'ABCDXY'
[string[i:i+2] for i in xrange(0, len(string), 2)]

You could also use the following (from - http://docs.python.org/library/itertools.html):

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

(Which depending how you look at it - may or may not be using 'loops' ;))

or something like:

re.findall('..?', string)
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • +1 for the zip solution, which none of these other answers mentioned. It actually returns `[(str, str), (str, str), ...]` though, not `[str, str, ...]`, so it would take more code to make it useful. – Mu Mind Sep 19 '12 at 09:26
  • Also, handiest bastardized use of regexes I've ever seen! I like yours the best because it handles odd lengths. – Mu Mind Sep 19 '12 at 09:36
0

Yet another solution, this one built on zip and a slice stride:

map(''.join, itertools.izip_longest(mystr[::2], mystr[1::2], fillvalue=''))

It does handle odd-length inputs.

Mu Mind
  • 10,935
  • 4
  • 38
  • 69
0

Here's a yet another solution without explicit loops (though @Emmanuel's answer is the most appropriate for your question):

s = 'abcdef'
L = zip(s[::2], s[1::2])
# -> [('a', 'b'), ('c', 'd'), ('e', 'f')]

To get strings:

print map(''.join, L)
# ['ab', 'cd', 'ef']

On Python 3 wrap using list() where necessary.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670