2

I need to split a string. I am using this:

def ParseStringFile(string):
p = re.compile('\W+')
result = p.split(string)

But I have an error: my result has two empty strings (''), one before 'Лев'. How do I get rid of them?

enter image description here

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
Denis
  • 2,622
  • 3
  • 22
  • 24
  • 2
    No. It works correctly. The empty string is due to the extra new line at the beginning of the string. – nhahtdh Feb 21 '14 at 21:52
  • nhahtdh I need to delete first and last empty (' ') elements of list, before using split? – Denis Feb 21 '14 at 22:01

2 Answers2

5

As nhahtdh pointed out, the empty string is expected since there's a \n at the start and end of the string, but if they bother you, you can filter them very quickly and efficiently.

>>> filter(None, ['', 'text', 'more text', ''])
['text', 'more text']

filter usually takes a callable function as first argument and creates a list with all elements removed for which function(element) returns False. Here None is given, which triggers a special case: The element is removed if bool(element) is false. As bool('') is false, it gets removed.

Also see the manual.

Chris
  • 5,788
  • 4
  • 29
  • 40
Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
2

You could remove all newlines from the string before matching it:

p.split(string.strip('\n'))

Alternatively, split the string and then remove the first and last element:

result = p.split(string)[1:-1]

The [1:-1] takes a copy of the result and includes all indexes starting at 1 (i.e. removing the first element), and ending at -2 (i.e. the second to last element. The second index is exclusive)

A longer and less elegant alternative would be to modify the list in-place:

result = p.split(string)
del result[-1]   # remove last element
del result[0]    # remove first element

Note that in these two solutions the first and last element must be the empty string. If sometimes the input doesn't contain these empty strings at the beginning or end, then they will misbehave. However they are also the fastest solutions.

If you want to remove all empty strings in the result, even if they happen inside the list of results you can use a list-comprehension:

[word for word in p.split(string) if word]
Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • One of the few instances where `filter` actually beats out list comps actually. http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings – Slater Victoroff Feb 21 '14 at 22:12
  • @SlaterTyranus I doubt speed matters in this case, but readability does and I prefer the list-comprehension. Also in python3 `filter` doesn't produce a list, which might or might not be what the OP wants. Also, if speed matters using the `[1:-1]` is much faster because it avoids all the truth tests altogether. – Bakuriu Feb 21 '14 at 22:14
  • Reasonable, just thought you might like to know. As someone who generally thinks things like `filter`and `map` should rarely be used, this is one case where I've really got to argue for the `filter`solution. 5x speed increase, and intuitively, you are filtering the list, but both are certainly accurate solutions. [1:-1] seems dangerously brittle to me. – Slater Victoroff Feb 21 '14 at 22:16
  • @SlaterTyranus That answer is from 2010. On my machine I get quite different results, although `filter` with `None` is (obviously) still the fastest (By about 2.8x on python2 and about 85% on python3). It seems like during the last 4 years there was a pretty good job in optimizing the interpreter. – Bakuriu Feb 21 '14 at 22:23
  • What about speed in filter solution? I don't want use [1;-1] solution becouse text don't have to have ' ' symbols, all sites various – Denis Feb 21 '14 at 22:33
  • Ah, that's great news! I just ran it on my machine and got similar results. – Slater Victoroff Feb 21 '14 at 22:35
  • @Denis That is the speed in the filter solution, still about 3x faster in python 2, with a more modest speed increase in python 3 (not sure which you're using.) – Slater Victoroff Feb 21 '14 at 22:36
  • Optimization it's very interesting for me, if it's not remove all readability – Denis Feb 21 '14 at 22:36
  • i'am using python 3.3 – Denis Feb 21 '14 at 22:37