-1

I want to write a script that reads from a csv file and splits each line by comma except any commas in-between two specific characters.

In the below code snippet I would like to split line by commas except the commas in-between two $s.

line = "$abc,def$,$ghi$,$jkl,mno$"

output = line.split(',')

for o in output:
   print(o)

How do I write output = line.split(',') so that I get the following terminal output?

~$ python script.py
$abc,def$
$ghi$
$jkl,mno$
martineau
  • 119,623
  • 25
  • 170
  • 301
  • 1
    Take a look at [the regular expression syntax](https://docs.python.org/3/library/re.html#regular-expression-syntax). Specifically, it seems you are looking for the negative look-ahead/look-behind operators – mousetail Jul 22 '22 at 09:36
  • Check out: [Splitting on spaces, except between certain characters](https://stackoverflow.com/questions/9644784/splitting-on-spaces-except-between-certain-characters) and customize solutions for comma and $. – DarrylG Jul 22 '22 at 09:41
  • 2
    Your requirements and expected output describe 2 different things. – matszwecja Jul 22 '22 at 09:41

4 Answers4

1

One solution (maybe not the most elegant but it will work) is to replace the string $,$ with something like $,,$ and then split ,,. So something like this

output = line.replace('$,$','$,,$').split(',,')

Using regex like mousetail suggested is the more elegant and robust solution but requires knowing regex (not that anyone KNOWS regex)

Ftagliacarne
  • 675
  • 8
  • 16
1

You can do this with a regular expression:

In re, the (?<!\$) will match a character not immediately following a $.

Similarly, a (?!\$) will match a character not immediately before a dollar.

The | character cam match multiple options. So to match a character where either side is not a $ you can use:

expression = r"(?<!\$),|,(?!\$)"

Full program:

import re
expression = r"(?<!\$),|,(?!\$)"
print(re.split(expression, "$abc,def$,$ghi$,$jkl,mno$"))
mousetail
  • 7,009
  • 4
  • 25
  • 45
1

Try regular expressions:

import re

line = "$abc,def$,$ghi$,$jkl,mno$"

output = re.findall(r"\$(.*?)\$", line)

for o in output:
    print('$'+o+'$')
$abc,def$
$ghi$
$jkl,mno$
Tinu
  • 2,432
  • 2
  • 8
  • 20
1

First, you can identify a character that is not used in that line:

c = chr(max(map(ord, line)) + 1)

Then, you can proceed as follows:

line.replace('$,$', f'${c}$').split(c)

Here is your example:

>>> line = '$abc,def$,$ghi$,$jkl,mno$'
>>> c = chr(max(map(ord, line)) + 1)
>>> result = line.replace('$,$', f'${c}$').split(c)
>>> print(*result, sep='\n')
$abc,def$
$ghi$
$jkl,mno$
Riccardo Bucco
  • 13,980
  • 4
  • 22
  • 50