3

Am wanting to split the following string:

Quantity [*,'EXTRA 05',*]

With the desired results being:

["Quantity", "[*,'EXTRA 05',*]"]

The closest I have found is using shlex.split, however this removes the internal quotes giving the following result:

['Quantity', '[*,EXTRA 05,*]']

Any suggestions would be greatly appreciated.

EDIT:

Will also require multiple splits such as:

"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"

To:

["Quantity", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

user3043991
  • 49
  • 1
  • 3
  • perhaps you could look into using regular expressions to capture the parts you want to seperate, one by one, via a generator function, or loop. Sorry if that makes no sense, I don't have more time. But if you know what i'm talking about, it could work. – Totem Nov 28 '13 at 07:21
  • Hello. Is it always the word _Quantity_ that appears as first characters ? Or at least a unique word before the representation of a list ? Are there always representations of lists as trailing characters ? Is there any chance that sequences as ``[*,'EXTRA[bonus] 05',*]`` or ``[*,'EXTRA;bonus] 05',*]`` or ``[*,'EXTRA[bonus[ 05',*]``appear in the string , that is to say nested brackets in a representation of a list ? – eyquem Nov 29 '13 at 11:23
  • Are you looking for a general solution to the question posed in your title or a specific solution for strings that look very much like the strings you've given as examples? – Jack Aidley Nov 29 '13 at 13:52

4 Answers4

4

To treat string, the basic way is the regular expression tool ( module re )

Given the infos you give (this mean they may be unsufficient) the following code does the job:

import re

r = re.compile('(?! )[^[]+?(?= *\[)'
               '|'
               '\[.+?\]')


s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s1)
print '---------------'      

s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
     "dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
print r.findall(s2)

result

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]  
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

The regular expression pattern must be undesrtood as follows:

'|' means OR

So the regex pattern expresses two partial RE:
(?! )[^[]+?(?= *\[)
and
\[.+?\]

The first partial RE :

The core is [^[]+
Brackets define a set of characters. The symbol ^ being after the first bracket [ , it means that the set is defined as all the characters that aren't the ones that follow the symbol ^.
Presently [^[] means any character that isn't an opening bracket [ and, as there's a + after this definition of set, [^[]+ means sequence of characters among them there is no opening bracket.

Now, there is a question mark after [^[]+ : it means that the sequence catched must stop before what is symbolized just after the question mark.
Here, what follows the ? is (?= *\[) which is a lookahead assertion, composed of (?=....) that signals it is a positive lookahead assertion and of *\[, this last part being the sequence in front of which the catched sequence must stop. *\[ means: zero,one or more blanks until the opening bracket (backslash \ needed to eliminate the meaning of [ as the opening of a set of characters).

There's also (?! ) in front of the core, it's a negative lookahead assertion: it is necessary to make this partial RE to catch only sequences beginning with a blank, so avoiding to catch successions of blanks. Remove this (?! ) and you'll see the effect.

The second partial RE :

\[.+?\] means : the opening bracket characater [ , a sequence of characters catched by .+? (the dot matching with any character except \n) , this sequence must stop in front of the ending bracket character ] that is the last character to be catched.

.

EDIT

string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
import re
print re.split(' (?=\[)',string)

result

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]

!!

Community
  • 1
  • 1
eyquem
  • 26,771
  • 7
  • 38
  • 46
  • If you are satisfied, could you upvote / accept my answer ? In fact, I believe you haven't enough points to upvote but you can accept by clicking on the symbol under the ``upvote button-number of points-downvote button`` at the left of my answer. – eyquem Dec 02 '13 at 23:02
1

Advised for picky people, the algorithm WON'T split well every string you pass through it, just strings like:

"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"

"Quantity [*,'EXTRA 05',*]"

"Quantity [*,'EXTRA 05',*] [*,'EXTRA 10',*] [*,'EXTRA 07',*] [*,'EXTRA 09',*]"

string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
splitted_string = []

#This adds "Quantity" to the position 0 of splitted_string
splitted_string.append(string.split(" ")[0])     

#The for goes from 1 to the lenght of string.split(" "),increasing the x by 2
#The first iteration x is 1 and x+1 is 2, the second x=3 and x+1=4 etc...
#The first iteration concatenate "[*,'EXTRA" and "05',*]" in one string
#The second iteration concatenate "[*,'EXTRA" and "09',*]" in one string
#If the string would be bigger, it will works
for x in range(1,len(string.split(" ")),2):
    splitted_string.append("%s %s" % (string.split(" ")[x],string.split(" ")[x+1]))

When I execute the code, splitted string at the end contains:

['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
splitted_string[0] = 'Quantity'
splitted_string[1] = "[*,'EXTRA 05',*]"
splitted_string[2] = "[*,'EXTRA 09',*]"

I think that is exactly what you're looking for. If I'm wrong let me know, or if you need some explanation of the code. I hope it helps

Jack Aidley
  • 19,439
  • 7
  • 43
  • 70
AlvaroAV
  • 10,335
  • 12
  • 60
  • 91
  • Try this with `string = "'zug hug'Quantity boondoggle 'fish face monkey dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"` and you'll see it fails with multiple spaces in the same quote, and with quoted sections abut unquoted sections. – Jack Aidley Nov 29 '13 at 11:40
  • lol ??? I suposed that the string will allways be like "Quantity [*,'EXTRA 05',*]" or "Quantity [*,'EXTRA 05',*][*,'EXTRA 07',*]". – AlvaroAV Nov 29 '13 at 11:45
  • what an angry kid... If I'm wrong I'll accept it, there is no problem for me, but I read again the problem, and it seems he'll allways work with strings like i mentioned before. Of course my code wont work if you pass any other kind of string... I suposed I had not to explain that, but I thought It was implicit... – AlvaroAV Nov 29 '13 at 11:46
  • Yeah, if you assume the strings are all of exactly the same form then your solution will work. Although I think if the strings all have exactly the same form you can probably do it with a regex more elegantly. – Jack Aidley Nov 29 '13 at 11:50
0

Assuming you want a general solution for splitting at spaces but not on space in quotations: I don't know of any Python library to do this, but there doesn't mean there isn't one.

In the absence of a known pre-rolled solution I would simply roll my own. It's relatively easy to scan a string looking for spaces and then use the Python slice functionality to divide up the string into the parts you want. To ignore spaces in quotes you can simply include a flag that switches on encountering a quote symbol to switch the space sensing on and off.

This is some code I knocked up to do this, it is not extensively tested:

def spaceSplit(string) :
  last = 0
  splits = []
  inQuote = None
  for i, letter in enumerate(string) :
    if inQuote :
      if (letter == inQuote) :
        inQuote = None
    else :
      if (letter == '"' or letter == "'") :
        inQuote = letter

    if not inQuote and letter == ' ' :
      splits.append(string[last:i])
      last = i+1

  if last < len(string) :
    splits.append(string[last:])

  return splits
Jack Aidley
  • 19,439
  • 7
  • 43
  • 70
  • I think split function is easier than this and do not modify the quotes or double quotes – AlvaroAV Nov 29 '13 at 11:34
  • This code doesn't modify the quote or double quotes? It also _works_ unlike the code in your answer. – Jack Aidley Nov 29 '13 at 11:39
  • jajajajja just copy and paste the code into any python console and it will works perfectly, and easy to understand. I don't meant to insult you, not your code, i dont know why you answer so angry... Try my code and tell me where fails to fix it, because I copy and paste again and again and it works perfectly, and again, sorry if I offended you, I didn't want to do it – AlvaroAV Nov 29 '13 at 11:43
  • I'm neither offended nor angry? – Jack Aidley Nov 29 '13 at 11:47
0

Try this

def parseString(inputString):
    output = inputString.split()
    res = []
    count = 0
    temp = []
    for word in output:
        if (word.startswith('"')) and count % 2 == 0:
            temp.append(word)
            count += 1
        elif count % 2 == 1 and not word.endswith('"'):
            temp.append(word)
        elif word.endswith('"'):
            temp.append(word)
            count += 1
            tempWord = ' '.join(temp)
            res.append(tempWord)
            temp = []
        else:
            res.append(word)


    print(res)

Input:

parseString('This is "a test" to your split "string with quotes"')

Output: ['This', 'is', '"a test"', 'to', 'your', 'split', '"string with quotes"']

gprx100
  • 370
  • 1
  • 4
  • 13