0

I am trying to find occurrences of a particular pattern in a string and have written the regex for the same. But the output is different when using different method in haskell. Pasting a sample for reference

Regex

<nARD>([a-zA-Z0-9?=-]*)</nARD>

Input String

<a> <b> <nARD>abcd</nARD> </c> <nARD>pqrs</nARD>

Expected Output

["abcd","pqrs"]

Current Output when using (getAllTextMatches (fileData =~ "<nARD>([a-zA-Z0-9?=-]*)</nARD>") :: [String]) operator

["<nARD>abcd</nARD>","<nARD>pqrs</nARD>"]

Current output when using matchRegex function

matchRegex (mkRegex "<nARD>([a-zA-Z0-9?=-]*)</nARD>") "<a> <b> <nARD>abcd</nARD> </c> <nARD>pqrs</nARD>"

Output

Just ["abcd"]

but expected output is Just ["abcd","pqrs"]

Sarthak
  • 188
  • 1
  • 2
  • 14
  • 1
    Apparently all regex-like things in haskell follow a very polymorphic API (not intuitive in my opinion). Instead of writing `string =~ regex` try this `string =~ regex :: [[String]]` – lsmor Mar 09 '23 at 11:25
  • 1
    Also, take into account that your regex actually matches `["abcd","pqrs"]` but you are not interested in the matches, but in the _matching groups_, which is different. – lsmor Mar 09 '23 at 11:27
  • @Ismor I think the return type is supposed to be of type [String] i.e. list of string. Can you help me with getting the matching groups? Or point me to some documentation which can help with the same? – Sarthak Mar 09 '23 at 12:01
  • which regex library are you using?? I am assuming `regex-tdfa` – lsmor Mar 09 '23 at 13:42
  • @Ismor Yes, I was using regex-tdfa. Have now moved to ```regex-pcre``` and your suggestion worked Thanks. Can you also help me figure out how to include ```*``` as one of the characters to be matched between and ?? Basically I want the regex to work on this as well 1234**** should also match – Sarthak Mar 09 '23 at 17:04
  • 2
    [Zal*go* **com̡e̶** s.](https://stackoverflow.com/q/1732348/791604) – Daniel Wagner Mar 09 '23 at 18:50
  • @lsmor regex-applicative is my favorite regex library, and it doesn't have the hugely-polymorphic problem. It does have the "problem" that you write a regex in parsec style, rather than by pasting a blob of PCRE text, but this is a lot more readable and composable, so i recommend it unless your main use case is porting already-written PCRE regexes. – amalloy Mar 10 '23 at 04:36
  • I am not a regex expert so I can't help you further than this. Following @amalloy's suggestion, If you are not porting software already written with regex, I'd strongly recommend using something like `attoparsec` or any kind of applicative parser library. They are way easier and readable approach. – lsmor Mar 10 '23 at 07:19

1 Answers1

0

Find all occurrences of string patterns with https://hackage.haskell.org/package/replace-megaparsec

import Data.Void
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either

input = "<a> <b> <nARD>abcd</nARD> </c> <nARD>pqrs</nARD>"

pattern :: Parsec Void String String
pattern = do
    string "<nARD>"
    fst <$> anyTill (string "</nARD>")

rights $ splitCap pattern input

["abcd","pqrs"]
James Brock
  • 3,236
  • 1
  • 28
  • 33