1

There is a regular expression matching quoted substrings: "/\"(?:[^\"\\]|\\.)*\"/" (originally /"(?:[^"\\]|\\.)*"/, see Here). Tested on regex101, it works.

With TDFA, it's syntax:

*** Exception: Explict error in module Text.Regex.TDFA.String : Text.Regex.TDFA.String died:
parseRegex for Text.Regex.TDFA.String failed:"/"(?:[^"\]|\.)*"/" (line 1, column 4):
unexpected "?"
expecting empty () or anchor ^ or $ or an atom

Is there a way co correct it?

Test string: Is big "problem", no?

Expected result: "problem"

UPD:

This is full context:

removeQuotedSubstrings :: String -> [String]
removeQuotedSubstrings str =
  let quoteds = concat (str =~ ("/\"(?:[^\"\\]|\\.)*\"/" :: String) :: [[String]])
  in  quoteds
Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46
  • I think you are trying to use a wrong regex flavour. AFAIR Posix EREs don't support `?:` Also please show your actual Haskell syntax. `\\.` is a Haskell backslash-period which is a RE literal-period. And what's with the `/.../` delimiters? They are not a part of any regex syntax. – n. m. could be an AI Dec 26 '17 at 09:19
  • I tried to remove slashes, no difference. Is there a substitute for `?:`. `TDFA` is used elsewhere in my program. See UPD. – Alexey Orlov Dec 26 '17 at 09:45
  • I don't think trial and error is a particularly useful method of getting a regex to work. I recommend knowing exactly what every single characrer is doing. Removing backslashes? Why not adding some instead? In this application, a plain old capturing group should do the job, there's no need to use a non-capturing group. – n. m. could be an AI Dec 26 '17 at 10:03
  • I played with it on `regex101`; non-capturing group matters :) . Of course some studying will do a lot of good to me. Conducted properly, it'll take a week or so... – Alexey Orlov Dec 26 '17 at 10:16
  • Try [`"\"(\\.|[^\"\\])*\""`](https://stackoverflow.com/a/7797678/3832970) – Wiktor Stribiżew Dec 26 '17 at 10:20
  • A term that matches a backslash character in regex syntax is spelled "\\" A literal backslash character in a string in Haskell source is spelled "\\". Therefore to make a regex that matches a backslash character you need **four** backslashes in a row in Haskell source. – n. m. could be an AI Dec 26 '17 at 10:21
  • Where exactly does it matter? There is no such feature in TDFA anyway, TDFA implements POSIX ERE. – n. m. could be an AI Dec 26 '17 at 10:26
  • @Wiktor Stribiżew: Much better, indeed. Input: `removeQuotedSubstrings "alf\"foo\" dp \"bar\" kip"`; output: `["\"foo\"","o","\"bar\"","r"]` – Alexey Orlov Dec 26 '17 at 10:34
  • So, you will have to somehow omit each even element from this list. POSIX ERE does not support non-capturing groups. – Wiktor Stribiżew Dec 26 '17 at 10:37
  • Will do. Thanks! – Alexey Orlov Dec 26 '17 at 10:45
  • @WiktorStribiżew can you explain what exactly `\\.` is doing in your regex? – n. m. could be an AI Dec 26 '17 at 10:58
  • It is either impossible or too unwieldy to support quoted strings if arbitrary backslash-escaped characters are allowed using POSIX ERE. If backslash-escaped characters are not needed, then a simple RE like `"\"([^\"])*\""` would suffice. – n. m. could be an AI Dec 26 '17 at 11:26

1 Answers1

0

No improvement, just an acceptable solution, albeit lacking in elegance:

import qualified Data.Text as T
import Text.Regex.TDFA

-- | Removes all double quoted substrings, if any, from a string.
--
-- Examples:
--
-- >>> removeQuotedSubstrings "alfa"
-- "alfa"
-- >>> removeQuotedSubstrings "ngoro\"dup\"lai \"ming\""
-- "ngoro lai  "
removeQuotedSubstrings :: String -> String
removeQuotedSubstrings str =
  let quoteds  = filter (('"' ==) . head)
               $ concat (str =~ ("\"(\\.|[^\"\\])*\"" :: String) :: [[String]])
  in  T.unpack $ foldr (\quoted acc -> T.replace (T.pack quoted) " " acc)
                       (T.pack str) quoteds

Yes, the final purpose has always been to remove the quoted substrings.

Alexey Orlov
  • 2,412
  • 3
  • 27
  • 46