Splitting a string into 2-letter segments

Question

I have a string, which I need to split into 2-letter pieces. For example, 'ABCDXY' should become ['AB', 'CD', 'XY']. The behavior in the case of odd number of characters may be entirely arbitrary (I'll check the length in advance).

Is there any way to do this without an ugly loop?

I didn't think about it. But now that I did... An ugly loop is a loop that is uglier than necessary, or that is present when no loop is really required :) — max, Sep 19 '12 at 09:10
related: [How do you split a list into evenly sized chunks in Python?](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python) — jfs, Sep 19 '12 at 09:13

score 26 · Accepted Answer · answered Sep 19 '12 at 09:06

26

>>> [s[i:i + 2] for i in range(0, len(s), 2)]
['AB', 'CD', 'XY']

answered Sep 19 '12 at 09:06

Emmanuel

13,935
12
50
72

2

thats a loop. OP said "no loops" – Inbar Rose Sep 19 '12 at 09:06
5

He did say "without an ugly loop"... and this certainly avoids that. :-) – John Szakmeister Sep 19 '12 at 09:08
Maybe that's the only way. I guess sometimes loops are required. – max Sep 19 '12 at 09:09
Well, I understand it in the way "not a standard for loop", I was supposing list comprehensions would be good. Let's see what the OP says... :-) – Emmanuel Sep 19 '12 at 09:10
1

List comprehensions are almost not a loop, so this would qualify :) – max Sep 19 '12 at 09:12
And this turned out to be faster than either of the regex approaches. – max Aug 05 '16 at 05:22

Inbar Rose · Answer 2 · 2013-07-23T15:43:38.953

16

Using regular expressions!

>>> import re
>>> s = "ABCDXYv"
>>> re.findall(r'.{1,2}',s,re.DOTALL)
['AB', 'CD', 'XY', 'v']

I know it has been a while, but I came back to this and was curious about which method was better; mine: r'.{1,2}' or Jon's r'..?'. On the surface, Jon's looks much nicer, and I thought it would be much faster than mine, but I was surprised to find otherwise, so I thought I would share:

>>> import timeit
>>> timeit.Timer("re.findall(r'.{1,2}', 'ABCDXYv')", setup='import re').repeat()
[1.9064299485802252, 1.8369554649334674, 1.8548105833383772]
>>> timeit.Timer("re.findall(r'..?', 'ABCDXYv')", setup='import re').repeat()
[1.9142223469651611, 1.8670038395145383, 1.85781945659771]

Which shows that indeed r'.{1,2}' is the better/faster choice. (But only slightly)

edited Jul 23 '13 at 15:43

answered Sep 19 '12 at 09:11

Inbar Rose

41,843
24
85
131

Careful, this fails to work correctly if the string contains newlines. – Tim Pietzcker Sep 19 '12 at 09:17
@TimPietzcker you are right, i will add that with the flag to handle such a case. – Inbar Rose Sep 19 '12 at 09:18
`()` are unnecessary in the regex: `re.findall('(?s)..', s)` – jfs Sep 19 '12 at 09:40
1

`or take the last character in the string and add that to your list at the end.` -- `def grouper()` seems a bit convoluted - the regex can just be `.{1,2}` – Jon Clements Sep 19 '12 at 09:45
You could use `..?` as the regex. – Janne Karila Sep 19 '12 at 10:31
@JanneKarila They could - but that's my answer ;) – Jon Clements Sep 19 '12 at 10:38
@JonClements good idea with {1,2} i dunno why i didnt think of that. will edit to reflect. – Inbar Rose Sep 19 '12 at 10:50

pr0gg3d · Answer 3 · 2012-09-19T09:14:28.613

2

You could try:

s = 'ABCDEFG'
r = [s[i:i+2] for i in xrange(0, len(s), 2)]

# r is ['AB', 'CD', 'EF', 'G']

UPDATE 2

If you don't care about odd chars, you could use a regex (avoiding the loop):

s = 'ABCDEFG'
r = re.compile('(..)').findall(s)
# r is ['AB', 'CD', 'EF']

edited Sep 19 '12 at 09:14

answered Sep 19 '12 at 09:07

pr0gg3d

977
5
7

Jon Clements · Answer 4 · 2012-09-19T09:12:58.613

1

There's nothing ugly about the perfectly Pythonic:

string = 'ABCDXY'
[string[i:i+2] for i in xrange(0, len(string), 2)]

You could also use the following (from - http://docs.python.org/library/itertools.html):

def grouper(n, iterable, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

(Which depending how you look at it - may or may not be using 'loops' ;))

or something like:

re.findall('..?', string)

edited Sep 19 '12 at 09:12

answered Sep 19 '12 at 09:07

Jon Clements

138,671
33
247
280

+1 for the zip solution, which none of these other answers mentioned. It actually returns `[(str, str), (str, str), ...]` though, not `[str, str, ...]`, so it would take more code to make it useful. – Mu Mind Sep 19 '12 at 09:26
Also, handiest bastardized use of regexes I've ever seen! I like yours the best because it handles odd lengths. – Mu Mind Sep 19 '12 at 09:36

score 0 · Answer 5 · answered Sep 19 '12 at 09:31

0

Yet another solution, this one built on zip and a slice stride:

map(''.join, itertools.izip_longest(mystr[::2], mystr[1::2], fillvalue=''))

It does handle odd-length inputs.

answered Sep 19 '12 at 09:31

Mu Mind

10,935
4
38
69

score 0 · Answer 6 · edited May 23 '17 at 12:10

0

Here's a yet another solution without explicit loops (though @Emmanuel's answer is the most appropriate for your question):

s = 'abcdef'
L = zip(s[::2], s[1::2])
# -> [('a', 'b'), ('c', 'd'), ('e', 'f')]

To get strings:

print map(''.join, L)
# ['ab', 'cd', 'ef']

On Python 3 wrap using list() where necessary.

edited May 23 '17 at 12:10

Community

1
1

answered Sep 19 '12 at 09:31

jfs

399,953
195
994
1,670

jinx! we were both thinking `zip` at the same time – Mu Mind Sep 19 '12 at 09:33

Splitting a string into 2-letter segments

6 Answers6

Linked

Related