1

So I have a string which I need to parse. The string contains a number of words, separated by a hyphen (-). The string also ends with a hyphen.

For example one-two-three-.

Now, if I want to look at the words on their own, I split up the string to a list.

wordstring = "one-two-three-"
wordlist = wordstring.split('-')

for i in range(0, len(wordlist)):
     print(wordlist[i])

Output

one
two
three
#empty element

What I don't understand is, why in the resulting list, the final element is an empty string. How can I omit this empty element?

Should I simply truncate the list or is there a better way to split the string?

strpeter
  • 2,562
  • 3
  • 27
  • 48
SaAtomic
  • 619
  • 1
  • 12
  • 29
  • 1
    Possible duplicate of [python split function -avoids last empy space](http://stackoverflow.com/questions/10780423/python-split-function-avoids-last-empy-space) – Chris_Rands Feb 15 '17 at 14:17

9 Answers9

4

You have an empty string because the split on the last - character produces an empty string on the RHS. You can strip all '-' characters from the string before splitting:

wordlist = wordstring.strip('-').split('-')
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
3

If the final element is always a - character, you can omit it by using [:-1] which grabs all the elements of the string besides the last character.

Then, proceed to split it as you did:

wordlist = wordstring[:-1].split('-')
print(wordlist)
['one', 'two', 'three']
Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
2

You can use regex to do this :

import re
wordlist = re.findall("[a-zA-Z]+(?=-)", wordstring)

Output :

['one', 'two', 'three']
Jarvis
  • 8,494
  • 3
  • 27
  • 58
  • 1
    Nice idea, but I'd just use `"[^-]+"` instead. We don't know what are the "legal" characters, just that it's not a `-`. – tobias_k Feb 15 '17 at 13:45
  • This answer also handles the `wordstring = "one-two---three-"` case correctly (assuming that in this case there should be no empty strings either) – tobias_k Feb 15 '17 at 13:48
  • 1
    Sure, with "this" I meant _this answer_, not _my suggestion_. – tobias_k Feb 15 '17 at 13:50
1

You should use the strip built-in function of Python before splitting your String. E.g:

wordstring = "one-two-three-"
wordlist = wordstring.strip('-').split('-')
LucG
  • 1,238
  • 13
  • 25
1

I believe .split() is assuming there is another element after the last - but it is obviously a blank entry.

Are you open to removing the dash in wordstring before splitting it?

wordstring = "one-two-three-"
wordlist = wordstring[:-1].split('-')
print wordlist

OUT: 'one-two-three'
NickBraunagel
  • 1,559
  • 1
  • 16
  • 30
1

This is explained in the docs:

... If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, '1,,2'.split(',') returns ['1', '', '2']). ...

If you know your strings will always end in '-', then just remove the last one by doing wordlist.pop().

If you need something more complicated you may want to learn about regular expressions.

daphtdazz
  • 7,754
  • 34
  • 54
1

Just for the variaty of options:

wordlist = [x for x in wordstring.split('-') if x]

Note that the above also handles cases such as: wordstring = "one-two--three-" (double hyphen)

Ma0
  • 15,057
  • 4
  • 35
  • 65
  • An inefficient option – Chris_Rands Feb 15 '17 at 13:47
  • @Chris_Rands That is why the disclaimer is there. – Ma0 Feb 15 '17 at 13:47
  • 1
    IMO not worth listing worse alternatives to existing solutions, but here's another then: `wordstring.replace('-','\n').splitlines()` – Chris_Rands Feb 15 '17 at 13:50
  • @Chris_Rands 1) you are free to dv if it messes up with your aesthetics. 2) this comprehension might no be so interesting here because this is a very simple case but the construct `[x for x in y if f(x)]` is valuable in many cases and often encountered. – Ma0 Feb 15 '17 at 14:02
  • I'm not gonna dv but 2) is irrelevant to *this* question – Chris_Rands Feb 15 '17 at 14:08
1

First strip() then split()

wordstring = "one-two-three-"
x = wordstring.strip('-')
y  = x.split('-')

for word in y:
    print word
mtt2p
  • 1,818
  • 1
  • 15
  • 22
-1

Strip/trim the string before splitting. This way you will remove the trailing "\n" and you should be fine.

Chobeat
  • 3,445
  • 6
  • 41
  • 59