0

I have strings that need to be placed into lists; for instance I require that

C C .0033 .0016 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C

becomes

['C', 'C', '.0033', '.0016', 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4', 'C']

So everything in quotes becomes a single list element; otherwise, everything separated by whitespace becomes a single list element.

My first idea was a simple split, place the items that don't contain ' into a new array, and then place the ones that are in a quoted-section back together:

>>> s.split()
['C', 'C', '.0033', '.0016', "'International", 'Tables', 'Vol', 'C', 'Tables', '4.2.6.8', 'and', "6.1.1.4'", 'C']
>>> arr = []
>>> i = 0
>>> while i < len(s):
        v = ''
        if s[i].startswith("'"):
            while not s[i].endswith("'"):
                v = v.append(s[i]+ " ")
                i += 1
            v.append(s[i])
            arr.append(v)
        else:
            arr.append(s[i])

But this strategy is pretty ugly, and in addition I have to assume that the string was split on a single space.

s.partition("'") seemed very promising:

>>> s.partition("'")
('C C .0033 .0016 ', "'", "International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C")

but it's awkward because I have to partition again as I iterate through, and it's context-sensitive as to which one was in quotes.

Is there a simple Python3 way to split this string as described above?

zondo
  • 19,901
  • 8
  • 44
  • 83
user14717
  • 4,757
  • 2
  • 44
  • 68

1 Answers1

2

You can use the shlex module. Example:

import shlex

print(shlex.split("C C .0033 .0016 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4' C"))
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Zroq
  • 8,002
  • 3
  • 26
  • 37