2

I have a string like this here:

"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"

And yes the double quotes are within this string.

Now I want to split this string into several parts with mystring.split(",") What I got is this

"BLAX"

"BLAY"

"BLAZ

BLUBB"

"BLAP"

But what I want is this:

"BLAX"

"BLAY"

"BLAZ, BLUBB"

"BLAP"

How can I achieve this and as well I want to keep the double quotes? I need this because I work with toml files.

Solution: Thanks @Giacomo Alzetta

I used the split command with the regular expression. Thanks also for explaining this!

Elec
  • 51
  • 4
  • split at `", ` (yes, one double quote there, or even with both: `", "`, so that `something, something` doesn't get split`) and then add a missing double quotes in a list comprehension – h4z3 Jul 16 '19 at 08:29
  • `re.split(r'(?<="),', '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"') ['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"']` – Giacomo Alzetta Jul 16 '19 at 08:30

6 Answers6

2

You can use ast.literal_eval and then add '"' manually:

s = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'

from ast import literal_eval

data = literal_eval('(' + s + ')')

for d in data:
    print('"{}"'.format(d))

Prints:

"BLAX"
"BLAY"
"BLAZ, BLUBB"
"BLAP"
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • `ast.literal_eval` is not safe. You should generally avoid it. – sophros Jul 16 '19 at 08:32
  • 2
    Withdrawn. You are right - there is even a [question about safety of `ast.literal_eval`](https://stackoverflow.com/questions/4710247/python-3-are-there-any-known-security-holes-in-ast-literal-evalnode-or-string). – sophros Jul 16 '19 at 08:35
  • Your answer would fail if a `"` is missing somewhere or if the input contains something like `"hello" 5`. Maybe all OP inputs are valid python syntax but they did not state that, they only said they don't want to split on a comma that is not after a `"`. – Giacomo Alzetta Jul 16 '19 at 08:38
2

You can also use the csv module.

Ex:

import csv

s = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"' 
r = csv.reader(s, delimiter = ',', quotechar='"')
res = [j for i in r for j in i if j.strip()] 
print(res)  

Output:

['BLAX', 'BLAY', 'BLAZ, BLUBB', 'BLAP']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

You can use a regular expression and the re.split function:

>>> import re
>>> re.split(r'(?<="),', '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"')
['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"']

(?<=") means must be preceded by " but the " is not included in the actual match so only the , is used to actually do the splitting.

You could split by ", but then you'd have to fix up the parts where the " is now missing:

>>> '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'.split('",')
['"BLAX', ' "BLAY', ' "BLAZ, BLUBB', ' "BLAP"']
>>> [el + ('' if el.endswith('"') else '"') for el in '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'.split('",')]
['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"']
Giacomo Alzetta
  • 2,431
  • 6
  • 17
1

you can split by " then remove the unwanted leftovers, and rewrap everything in quotes, with a simple list-comp.

string = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'

parts = ['"{}"'.format(s) for s in string.split('"') if s not in ('', ', ')]

for p in parts:
    print(p)

Output:

"BLAX"
"BLAY"
"BLAZ, BLUBB"
"BLAP"
Adam.Er8
  • 12,675
  • 3
  • 26
  • 38
1

As I said in comments, you can split at more than a single separator. A comma gets both a one in quotes and outside, but we can do split at ", (added a space so that we don't have to strip it ;) )

Then we add the missing quotations:

original = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'
[s if s.endswith('"') else s+'"' for s in original.split('", ')]

Output: ['"BLAX"', '"BLAY"', '"BLAZ, BLUBB"', '"BLAP"']

This approach doesn't use regexes, so it's faster. You also don't need to play with what regexes are correct for your case (I generally like regexes, but I like smart splitting and operations more).

h4z3
  • 5,265
  • 1
  • 15
  • 29
0

You may replace and split

s.replace('", ', '"|').split('|')

Out[672]: ['"BLAX"', ' "BLAY"', ' "BLAZ, BLUBB"', ' "BLAP"'] 
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • 1
    Change first argument to `'", '` (add space) so that output doesn't have unnecessary spaces. :) – h4z3 Jul 16 '19 at 08:39
  • @h4z3; Thanks. I just do it right off the bat and didn't notice the added space. +1 – Andy L. Jul 16 '19 at 08:41