How to find the indexes of certain character not in quotes in Python?

Question

I ultimately want to split a string by a certain character. I tried Regex, but it started escaping \, so I want to avoid that with another approach (all the attempts at unescaping the string failed). So, I want to get all positions of a character char in a string that is not within quotes, so I can split them up accordingly.

For example, given the phase hello-world:la\test, I want to get back 11 if char is :, as that is the only : in the string, and it is in the 11th index. However, re does split it, but I get ['hello-world,lat\\test'].

EDIT: @BoarGules made me realize that re didn't actually change anything, but it's just how Python displays slashes.

Please post a [MCVE] of your problem. We can likely help with the regex, but it's a lot easier to fix a problem with a [MCVE] than solve your problem from scratch with a fairly vague problem description. — ShadowRanger, Apr 07 '22 at 13:43
https://stackoverflow.com/questions/3475251/split-a-string-by-a-delimiter-in-python or https://stackoverflow.com/questions/37484624/split-string-at-delimiter-in-python or https://stackoverflow.com/questions/67032664/python-split-string-without-losing-split-character probably answers your question. — Marijn, Apr 07 '22 at 13:43
@DrownedSuccess: You added an example input and output, but not the code you tried. Please provide that non-working code, as text, in the body of the question, and we can try to help you with it. — ShadowRanger, Apr 07 '22 at 14:06
Also, side-note: Are you by any chance trying to parse lines from a pseudo-CSV format (using `:` as the field delimiter instead of `,`)? If so, don't reinvent the wheel, just use the `csv` module (it can customize the delimiter or the whole dialect as needed for just about any text format with arbitrary delimiters and quoting rules). — ShadowRanger, Apr 07 '22 at 14:42
You are mistaken if you believe that `['hello-world,lat\\test']` is not correct, it is because you think that the \\ that you see is in the data you get back. It isn't. That is simply the visual representation of the single backslash that is really there. — BoarGules, Apr 07 '22 at 14:52
@BoarGules This. This was actually my main problem, and my original solution worked perfectly. — DrownedSuccess, Apr 07 '22 at 16:13

score 0 · Accepted Answer · answered Apr 07 '22 at 13:53

0

Here's a function that works:

def split_by_char(string,char=':'):
    PATTERN = re.compile(rf'''((?:[^\{char}"']|"[^"]*"|'[^']*')+)''')
    return [string[m.span()[0]:m.span()[1]] for m in PATTERN.finditer(string)]

answered Apr 07 '22 at 13:53

DrownedSuccess

123
1
8

Two notes: 1) There's no real benefit to precompiling if you have to do it every time (you could just invoke `re.finditer(stringpat, string)`). 2) That listcomp is a really elaborate (read: verbose and inefficient) way to get the exact same result as just `return PATTERN.findall(string)`, or, if you really want `finditer`, `return [m[0] for m in PATTERN.finditer(string)]`. – ShadowRanger Apr 07 '22 at 14:11

score 0 · Answer 2 · answered Apr 07 '22 at 13:55

0

string = 'hello-world:la\test'
    
char = ':'
    
print(string.find(char))

Prints

char_index = string.find(char)

string[:char_index]

Returns

'hello-world'

string[char_index+1:]

Returns

'la\test'

answered Apr 07 '22 at 13:55

gremur

1,645
2
7
20

While the example they gave is poor, from the description, I think the OP needed it to *not* find the character in question if it was found inside internal quotes, thus the need for a regex. So `'hello-world:la\test'` should split, but `'hello-world":"la\test'` should not. – ShadowRanger Apr 07 '22 at 14:12

score 0 · Answer 3 · answered Apr 07 '22 at 14:52

Solution for the case you're likely encountering (a pseudo-CSV format you're hand-rolling a parser for; if you're not in that situation, it's still a likely situation for people finding this question later):

Just use the `csv` module.

import csv
import io

test_strings = ['field1:field2:field3', 'field1:"field2:with:embedded:colons":field3']

for s in test_strings:
    for row in csv.reader(io.StringIO(s), delimiter=':'):
        print(row)

Try it online!

which outputs:

['field1', 'field2', 'field3']
['field1', 'field2:with:embedded:colons', 'field3']

correctly ignoring the colons within the quoted field, requiring no kludgy, hard-to-verify hand-written regexes.

How to find the indexes of certain character not in quotes in Python?

3 Answers3

Just use the csv module.

Just use the `csv` module.