4

Could anyone help me implement a function that splits a string into an array of strings, split by a period (.)?

So: "This is sentence one. This is sentence two." becomes: ["This is sentence one", "This is sentence two"]

However, if the period is inbetween two numbers eg: (2.5) then don't split?

TransmissionsDev
  • 388
  • 4
  • 16

2 Answers2

3

Lots of ways you can do this. Here's a few.

With Text.Regex:

import Text.Regex.TDFA -- I think this will work with the other regex backends too

getAllTextMatches ("One. Two. a 3.5 b. cde." =~ "([0-9]\\.[0-9]|[^.])+" :: AllTextMatches [] String)
-- ["One"," Two"," a 3.5 b"," cde"]

With Text.ParserCombinators.ReadP (in base, so no third-party libraries needed):

import Data.Char
import Text.ParserCombinators.ReadP

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> satisfy isDigit <*> char '.' <*> satisfy isDigit
parseNonDot = (:[]) <$> satisfy ('.' /=)
parseSentence = fmap concat . many $ parseDigitsWithDecimalPoint <++ parseNonDot
readP_to_S (sepBy parseSentence (char '.') <* eof) "One. Two. a 3.5 b. cde."
-- [(["One"," Two"," a 3.5 b"," cde",""],"")]

With Text.Regex.Applicative:

import Control.Applicative.Combinators
import Data.Char
import Text.Regex.Applicative

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> psym isDigit <*> sym '.' <*> psym isDigit
parseNonDot = (:[]) <$> psym ('.' /=)
parseSentence = fmap concat . many $ parseDigitsWithDecimalPoint <|> parseNonDot
match (sepBy parseSentence (sym '.')) "One. Two. a 3.5 b. cde."
-- Just ["One"," Two"," a 3.5 b"," cde",""]

With Text.Megaparsec:

{-# LANGUAGE FlexibleContexts, TypeFamilies #-}
import Data.Char
import Text.Megaparsec

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> satisfy isDigit <*> single '.' <*> satisfy isDigit
parseNonDot = (:[]) <$> anySingleBut '.'
parseSentence = fmap concat . many $ try parseDigitsWithDecimalPoint <|> parseNonDot
parseMaybe (sepBy parseSentence (single '.')) "One. Two. a 3.5 b. cde."
-- Just ["One"," Two"," a 3.5 b"," cde",""]

Completely by hand:

import Data.Char
import Data.List.NonEmpty

foo = go id where
  go f "" = f "" :| []
  go f ('.':xs) = f "" <| go id xs
  go f (x1:'.':x2:xs) | isDigit x1 && isDigit x2 = go (f . (x1:) . ('.':) . (x2:)) xs
  go f (x:xs) = go (f . (x:)) xs
foo "One. Two. a 3.5 b. cde."
-- "One" :| [" Two"," a 3.5 b"," cde",""]
-1

Use Text.Regex.Posix with this regular expression \.(?![0-9]).

Something like:

Prelude> :m +Text.Regex.Posix
Prelude Text.Regex.Posix> getAllTextMatches $ "This is sentence one. This is sentence two." =~ "\.(?![0-9])" :: [String]
["This is sentence one", "This is sentence two"]
Diogo Rocha
  • 9,759
  • 4
  • 48
  • 52
  • 1
    Is there any assumption for `:m + Text.Regex.Posix` to work? In my case I get an error, so I guess I should first install something. – Enlico Jan 19 '20 at 17:33
  • Getting this error: `lexical error in string/character literal at character '.'` – TransmissionsDev Jan 19 '20 at 17:40
  • That error is because the backslash needs to be doubled there. However, there's a bigger problem with this answer: `Text.Regex.Posix` doesn't support lookahead, so even once that's fixed, you'll just get `*** Exception: user error (Text.Regex.Posix.String died: (ReturnCode 13,"Invalid preceding regular expression"))`. – Joseph Sible-Reinstate Monica Jan 20 '20 at 01:29
  • This isn't "splitting" a string, it's "matching" and extracting matches. Similar but different. – Andrew Koster Jul 02 '20 at 16:46