29

For example, if I had the following string:

"this-is-a-string"

Could I split it by every 2nd "-" rather than every "-" so that it returns two values ("this-is" and "a-string") rather than returning four?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Gnuffo1
  • 3,478
  • 11
  • 39
  • 53

6 Answers6

49

Here’s another solution:

span = 2
words = "this-is-a-string".split("-")
print ["-".join(words[i:i+span]) for i in range(0, len(words), span)]
Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136
Gumbo
  • 643,351
  • 109
  • 780
  • 844
17
>>> s="a-b-c-d-e-f-g-h-i-j-k-l"         # use zip(*[i]*n)
>>> i=iter(s.split('-'))                # for the nth case    
>>> map("-".join,zip(i,i))    
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*3))
['a-b-c', 'd-e-f', 'g-h-i', 'j-k-l']
>>> i=iter(s.split('-'))
>>> map("-".join,zip(*[i]*4))
['a-b-c-d', 'e-f-g-h', 'i-j-k-l']

Sometimes itertools.izip is faster as you can see in the results

>>> from itertools import izip
>>> s="a-b-c-d-e-f-g-h-i-j-k-l"
>>> i=iter(s.split("-"))
>>> ["-".join(x) for x in izip(i,i)]
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l']

Here is a version that sort of works with an odd number of parts depending what output you desire in that case. You might prefer to trim the '-' off the end of the last element with .rstrip('-') for example.

>>> from itertools import izip_longest
>>> s="a-b-c-d-e-f-g-h-i-j-k-l-m"
>>> i=iter(s.split('-'))
>>> map("-".join,izip_longest(i,i,fillvalue=""))
['a-b', 'c-d', 'e-f', 'g-h', 'i-j', 'k-l', 'm-']

Here are some timings

$ python -m timeit -s 'import re;r=re.compile("[^-]+-[^-]+");s="a-b-c-d-e-f-g-h-i-j-k-l"' 'r.findall(s)'
100000 loops, best of 3: 4.31 usec per loop

$ python -m timeit -s 'from itertools import izip;s="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in izip(i,i)]'
100000 loops, best of 3: 5.41 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 'i=iter(s.split("-"));["-".join(x) for x in zip(i,i)]'
100000 loops, best of 3: 7.3 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' 't=s.split("-");["-".join(t[i:i+2]) for i in range(0, len(t), 2)]'
100000 loops, best of 3: 7.49 usec per loop

$ python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l"' '["-".join([x,y]) for x,y in zip(s.split("-")[::2], s.split("-")[1::2])]'
100000 loops, best of 3: 9.51 usec per loop
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • 2
    You’re using the wrong code for my proposal. I’m operating on the words an not the string. `python -m timeit -s 's="a-b-c-d-e-f-g-h-i-j-k-l".split("-")' '["-".join(s[i:i+2]) for i in range(0, len(s), 2)]'` – Gumbo Oct 25 '09 at 21:01
  • 1
    Nicely done, but fails for an odd number of elements. It shouldn't be too hard to overcome though. – RedGlyph Oct 25 '09 at 21:10
  • @Gumbo, sorry, I fixed it to match your comment, I've just moved the `split()` out of the setup clause and used `t` as a temporary variable – John La Rooy Oct 25 '09 at 21:39
12

Regular expressions handle this easily:

import re
s = "aaaa-aa-bbbb-bb-c-ccccc-d-ddddd"
print re.findall("[^-]+-[^-]+", s)

Output:

['aaaa-aa', 'bbbb-bb', 'c-ccccc', 'd-ddddd']

Update for Nick D:

n = 3
print re.findall("-".join(["[^-]+"] * n), s)

Output:

['aaaa-aa-bbbb', 'bb-c-ccccc']
recursive
  • 83,943
  • 34
  • 151
  • 241
1

EDIT: The original code I posted didn't work. This version does:

I don't think you can split on every other one, but you could split on every - and join every pair.

chunks = []
content = "this-is-a-string"
split_string = content.split('-')

for i in range(0, len(split_string) - 1,2) :
    if i < len(split_string) - 1:
        chunks.append("-".join([split_string[i], split_string[i+1]]))
    else:
        chunks.append(split_string[i])
EmFi
  • 23,435
  • 3
  • 57
  • 68
  • This does not work. The output consists of a list of 1 character strings containing a hyphen. – recursive Oct 25 '09 at 20:20
  • @Jed His idea is good, you could write the implementation your own. –  Oct 25 '09 at 20:22
  • Yeah. Splice didn't work the way I thought it did, I've fixed the implementatino. – EmFi Oct 25 '09 at 20:26
  • Downvote removed. You might as well just do split_string[i:i+2] rather than creating a list literal, since you know the size already. – recursive Oct 25 '09 at 22:30
0

I think several of the already given solutions are good enough, but just for fun, I did this version:

def twosplit(s,sep):
  first=s.find(sep)
  if first>=0:
    second=s.find(sep,first+1)
      if second>=0:
        return [s[0:second]] + twosplit(s[second+1:],sep)
      else:
        return [s]
    else:
      return [s]
  print twosplit("this-is-a-string","-")
elzapp
  • 1,961
  • 4
  • 15
  • 22
-1
l = 'this-is-a-string'.split()
nl = []
ss = ""
c = 0
for s in l:
   c += 1
   if c%2 == 0:
       ss = s
   else:
       ss = "%s-%s"%(ss,s)
       nl.insert(ss)

print nl
SpliFF
  • 38,186
  • 16
  • 91
  • 120
  • sorry, i misread your question first time and rewrote it, n was a leftover from previous. Now it gives a list of strings. – SpliFF Oct 25 '09 at 20:15
  • This is very complicated (long to read/decipher), compared to many of the other solutions proposed here… – Eric O. Lebigot Oct 25 '09 at 21:44
  • rubbish. it's actually much easier to decipher. length is largely irrelevant and it could be shortened by making it less readable. It should have good performance since the loop only has a simple test condition to deal with. Also it has the most flexibility for handling other processing inside the loop. Also the winning answer will crash on a string with an odd number of hyphens. Iter and list ops might be pythonic but that doesn't necessarily make them 'better'. – SpliFF Oct 26 '09 at 11:36
  • Your code has many errors: `# TypeError: insert() takes exactly 2 arguments (1 given)` can be fixed by using `append` instead -> returns `['-this-is-a-string']`. While it now runs, the result is false. Fixing the split character to `'-'`: returns `['-this', 'is-a']` which is better but still wrong. By moving the `c += 1` line after the `else-clause you can fix that as well. After those fixes the solution is fine – Patrick Artner Jan 26 '19 at 09:59