39

I want to use python re.split() to split a string into individual words by spaces, commas and periods. But I don't want "1,200" to be split into ["1", "200"] or ["1.2"] to be split into ["1", "2"].

Example

l = "one two 3.4 5,6 seven.eight nine,ten"

The result should be ["one", "two", "3.4", "5,6" , "seven", "eight", "nine", "ten"]

rohanag
  • 913
  • 3
  • 11
  • 15

2 Answers2

69

Use a negative lookahead and a negative lookbehind:

> s = "one two 3.4 5,6 seven.eight nine,ten"
> parts = re.split('\s|(?<!\d)[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten']

In other words, you always split by \s (whitespace), and only split by commas and periods if they are not followed (?!\d) or preceded (?<!\d) by a digit.

DEMO.

EDIT: As per @verdesmarald comment, you may want to use the following instead:

> s = "one two 3.4 5,6 seven.eight nine,ten,1.2,a,5"
> print re.split('\s|(?<!\d)[,.]|[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten', '1.2', 'a', '5']

This will split "1.2,a,5" into ["1.2", "a", "5"].

DEMO.

João Silva
  • 89,303
  • 29
  • 152
  • 158
  • 6
    I think the OP actually wants not followed *and* proceeded, rather than or, so it should be `(?<!\d)[.,]|[.,](?!\d)` not `(?<!\d)[,.](?!\d)`. E.g. I suspect `"1.2,a"` should become `["1.2", "a"]`. – verdesmarald Oct 02 '12 at 01:13
  • @verdesmarald: You may be right indeed, I've edited my answer to reflect that, thanks. – João Silva Oct 02 '12 at 01:23
  • https://stackoverflow.com/questions/50463669/special-regex-for-string-in-python#50463935 – Jayesh Dhandha May 22 '18 at 09:29
3

So you want to split on spaces, and on commas and periods that aren't surrounded by numbers. This should work:

r" |(?<![0-9])[.,](?![0-9])"
Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592