6

I need a way to remove all whitespace from a string, except when that whitespace is between quotes.

result = re.sub('".*?"', "", content)

This will match anything between quotes, but now it needs to ignore that match and add matches for whitespace..

ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
Oli
  • 2,370
  • 2
  • 26
  • 42

5 Answers5

7

I don't think you're going to be able to do that with a single regex. One way to do it is to split the string on quotes, apply the whitespace-stripping regex to every other item of the resulting list, and then re-join the list.

import re

def stripwhite(text):
    lst = text.split('"')
    for i, item in enumerate(lst):
        if not i % 2:
            lst[i] = re.sub("\s+", "", item)
    return '"'.join(lst)

print stripwhite('This is a string with some "text in quotes."')
kindall
  • 178,883
  • 35
  • 278
  • 309
  • Someone will be along shortly to replace it with a one-line list comprehension, I am sure. :-) – kindall Aug 31 '10 at 14:45
  • hahaha - i actually missed the remark on the one-liner till after posting mine. I did build on your idea though. ++ – Nas Banov Aug 31 '10 at 23:59
6

Here is a one-liner version, based on @kindall's idea - yet it does not use regex at all! First split on ", then split() every other item and re-join them, that takes care of whitespaces:

stripWS = lambda txt:'"'.join( it if i%2 else ''.join(it.split())
    for i,it in enumerate(txt.split('"'))  )

Usage example:

>>> stripWS('This is a string with some "text in quotes."')
'Thisisastringwithsome"text in quotes."'
Nas Banov
  • 28,347
  • 6
  • 48
  • 67
  • I regret that I have but one upvote to give for your solution. – kindall Aug 31 '10 at 23:52
  • I have to the the oppossite, remove white spaces in quoted strings: ``'"'.join([''.join(it.split(' ')) if i%2 else it for i,it in enumerate(m.split('"'))])`` – adrianlzt Mar 27 '18 at 13:54
5

You can use shlex.split for a quotation-aware split, and join the result using " ".join. E.g.

print " ".join(shlex.split('Hello "world     this    is" a    test'))
Ivo van der Wijk
  • 16,341
  • 4
  • 43
  • 57
  • Your example gave me 'Hello world this is a test' instead of 'Hello"world this is"atest' – Oli Aug 31 '10 at 14:07
  • @Oli: you could use `map(pipes.quote, shlex.split(..))` to add quotes where necessary. – jfs Jan 27 '13 at 16:18
3

Oli, resurrecting this question because it had a simple regex solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

Here's the small regex:

"[^"]*"|(\s+)

The left side of the alternation matches complete "quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expression on the left.

Here is working code (and an online demo):

import re
subject = 'Remove Spaces Here "But Not Here" Thank You'
regex = re.compile(r'"[^"]*"|(\s+)')
def myreplacement(m):
    if m.group(1):
        return ""
    else:
        return m.group(0)
replaced = regex.sub(myreplacement, subject)
print(replaced)

Reference

  1. How to match pattern except in situations s1, s2, s3
  2. How to match a pattern unless...
Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
0

Here little longish version with check for quote without pair. Only deals with one style of start and end string (adaptable for example for example start,end='()')

start, end = '"', '"'

for test in ('Hello "world this is" atest',
             'This is a string with some " text inside in quotes."',
             'This is without quote.',
             'This is sentence with bad "quote'):
    result = ''

    while start in test :
        clean, _, test = test.partition(start)
        clean = clean.replace(' ','') + start
        inside, tag, test = test.partition(end)
        if not tag:
            raise SyntaxError, 'Missing end quote %s' % end
        else:
            clean += inside + tag # inside not removing of white space
        result += clean
    result += test.replace(' ','')
    print result
Tony Veijalainen
  • 5,447
  • 23
  • 31