2

I need to compare a string to different regexps, which each give a different result.

In ruby I would do something like

case str
when regexp1 then result1
when regexp2 then result2
when regexp3 then result3 
... 

My first attempt in Haskell is

if str =~ regexp1 
   then result 1
   else
      let (_,_,_,groups) = str =~ regexp2 
      in if (length groups > 0) 
         then result2 groups
         else ...

I'm sure there is a much nicer way to do so in Haskell using Alternative or Monad etc ...

Update

My second solution (using alternavite)

fun str = 
   regexp1 ?~ (\_ -> "result1")
   <|> regexp2 ?~ (\[capture] -> "capture"
   <|> regexp3 ?~ (\[c1,c2\ ] -> c1 ++ ":" c2
   where r ?~ f = do
         match <- r =~~ str
         (_,_,_,groups) = match :: (String, String, String, [String])
         Just (f groups)
            (
Community
  • 1
  • 1
mb14
  • 22,276
  • 7
  • 60
  • 102
  • what is ``=~``? I mean I would google for it if that were possible.... – BitTickler Apr 21 '15 at 20:04
  • 1
    Does this answer the question, even if it's not specific to regexp? http://stackoverflow.com/a/28529498/3234959 – chi Apr 21 '15 at 20:11
  • 1
    @user2225104 It is the entry point for basically the entire regular expression API. It is generic on return type and is [somewhat complex](http://haddocks.fpcomplete.com/fp/7.8/20140916-162/regex-base/Text-Regex-Base-RegexLike.html#t:RegexContext). Usually the best way to find out is to use [hoogle](https://www.fpcomplete.com/hoogle?q=%3D~&env=ghc-7.8-stable-14.09), and then deciding from the imports or package which one is relevant. – kazagistar Apr 21 '15 at 20:24
  • 1
    More concretely, you can write a list of individual matchers, returning `Just something` on success and `Nothing` on failure. Once you have a `[Maybe whatever]` use `msum` to get the first `Just` (or `Nothing` if all failed). – chi Apr 21 '15 at 20:26
  • @chi (about the SO link) That work indeed to just check if regexp are matched, but the problem is to do pattern matching to be able to use the captures (groups) in my example. – mb14 Apr 21 '15 at 20:34
  • @chi `msum` solution. That's the way I'm going but I would prefer something *nicer* visually. – mb14 Apr 21 '15 at 20:57
  • 1
    You can use groups and pattern matching, roughly as `msum [ do let (...,groups) = string =~ regexp ; guard (not . null $ groups) ; return (f groups) , otherbranch, ... ]`. I agree that this is not terribly nice visually. Visually the Alternative in your post is nicer to me. – chi Apr 21 '15 at 22:16
  • @chi could you make answer from your msum example and I'll accept it. – mb14 Apr 22 '15 at 08:00

2 Answers2

1

Here's an attempt:

fromJust $ msum 
     [ guard (str =~ regexp1) >>
         return (result 1)
     , let (_,_,_,groups) = str =~ regexp2 in
       guard (not $ null groups) >>
         return (result2 groups)
     , ... ]

You might have to add a few type annotations here and there, to tell GHC how to disambiguate the heavily overloaded (=~).

The code above assumes there is a last "catch-all" case (the final else), otherwise fromJust will crash. As a more safe alternative, move the catch-all at the beginning and use fromMaybe catchAll instead.

Further alternatives to consider: you can use do; you can use <|> instead of msumming a list; you can use =~~ to return a monadic result and avoid guard (as @mb14 points out below).

chi
  • 111,837
  • 3
  • 133
  • 218
  • 1
    Note you can use `=~~` instead of `=~` which return a Monadic answer, and so remove the need of `guard`. I accept the answer anyway as It's a good example of what `guard` is for ;-) – mb14 Apr 22 '15 at 11:59
0

While for those used to Ruby, the reference syntax you give might appear convenient, it might not be as clear to people who read the code and never saw Ruby before.

So, without sacrificing conciseness and adding expressiveness, I would suggest you write yourself a function - assuming you use this in more than one spot of your program, it will pay off. And the name of the function improves readability.

You might want to tweak the code below so you get not String but another type, useful to your subsequent processing. (Groups etc.). I did not bother, never having used Haskel regular expressions before to find out what exactly you need for your use case.

import Data.Maybe
import Text.Regex.Base
import Text.Regex.Posix

firstMatch :: [(String, (String -> String))] -> String -> Maybe String
firstMatch [] text = Nothing
firstMatch (x:xs) text =
    let (ex,a) = x in
    let r = text =~ ex :: String in
    if "" == r then firstMatch xs text else Just (a r)

regex1 = "Hello|hello"
regex2 = "world|World"
regex3 = "\\w*[\\s]*\\w*"

main :: IO ()
main = print (firstMatch [(regex1,hello),(regex2,world),(regex3,twoWords)] "Hello World")
    where 
        -- hello, world, twoWords are the regexN specific actions...
        hello ms = ms
        world ms = ms
        twoWords ms = ms

Result:

Just "Hello"

EDIT: Added continuation style specific handling depending on which regex matched.

BitTickler
  • 10,905
  • 5
  • 32
  • 53
  • Note that my question is not about just checking if any of the regexp matches but (they are nicer way to do so) but allow a different response according to which regexp succeed (and also uses capture). – mb14 Apr 22 '15 at 08:03