0

I have a regexp to split strings by comma, ignoring commas between single or double quotes, given the following Ruby code:

def separate params
  params.split(?!\B('|")[^\"']*),(?![^\"']*('|")\B)
end

It DOES work as intended with the exception of strings that feature special characters like @ or #

Example with expected behavior:

https://regex101.com/r/xB7rQ7/156

"\"search\", placeholder: \"Busca rápida: 1.4 8V, Flex, automático...\", id: \"search_terms\" "

Example with unexpected behavior:

https://regex101.com/r/xB7rQ7/157

"\"search\", placeholder: \"Busca rápida: 1.4 8V, Flex, automático...\", id: \"#search_terms\" "

Note that the only difference is the # symbol before "search_terms", but the regexp does separate placeholder from id only in the first case.

Can anyone shed some light into my regexp so that it works in both cases as expected? Please note this is about a specific case of string splitting that is not covered by another questions.

ErvalhouS
  • 4,178
  • 1
  • 22
  • 38

1 Answers1

0

Try this Regex:

,(?=(?:(?:[^"']*["']){2})*[^"']*$)

Click for Demo

Explanation:

  • , - matches a ,
  • (?=(?:(?:[^"']*["']){2})*[^"']*$) - positive lookahead to validate that the , match above must be followed by even number of " or '
    • (?:(?:[^"']*["']){2})* - matches 0+ occurrences of any character that is neither " nor ' followed by either " or '. The quantifier {2} at the end repeats this subsequence 2 times. The quantifier * in the end repeats this whole subsequence even number of times(i.e, either 0 or 2 or 4 or 6...occurrences). This subsequence may have some shortcomings if " and ' appear simultaneously in the string
    • [^"']*$ - after finding even number of occurrences there should be no " or ' present before the end of the line
Gurmanjot Singh
  • 10,224
  • 2
  • 19
  • 43