How to split a string by '.' in Haskell but if the period is in-between two numbers eg: (2.5) then don't split?

Question

Could anyone help me implement a function that splits a string into an array of strings, split by a period (.)?

So: "This is sentence one. This is sentence two." becomes: ["This is sentence one", "This is sentence two"]

However, if the period is inbetween two numbers eg: (2.5) then don't split?

We have multiple excellent solutions for parsing. Pick one. Google will help. — Daniel Wagner, Jan 19 '20 at 17:02
Based on your description, I'd expect the second element of your example list to have a leading space, but it doesn't. — Joseph Sible-Reinstate Monica, Jan 19 '20 at 20:57
@Joseph Sible-Reinstate Monica, you're correct, bad example on my part. — TransmissionsDev, Jan 20 '20 at 01:20

Joseph Sible-Reinstate Monica · Accepted Answer · 2020-01-20T03:30:31.000

Lots of ways you can do this. Here's a few.

With Text.Regex:

import Text.Regex.TDFA -- I think this will work with the other regex backends too

getAllTextMatches ("One. Two. a 3.5 b. cde." =~ "([0-9]\\.[0-9]|[^.])+" :: AllTextMatches [] String)
-- ["One"," Two"," a 3.5 b"," cde"]

With Text.ParserCombinators.ReadP (in base, so no third-party libraries needed):

import Data.Char
import Text.ParserCombinators.ReadP

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> satisfy isDigit <*> char '.' <*> satisfy isDigit
parseNonDot = (:[]) <$> satisfy ('.' /=)
parseSentence = fmap concat . many $ parseDigitsWithDecimalPoint <++ parseNonDot
readP_to_S (sepBy parseSentence (char '.') <* eof) "One. Two. a 3.5 b. cde."
-- [(["One"," Two"," a 3.5 b"," cde",""],"")]

With Text.Regex.Applicative:

import Control.Applicative.Combinators
import Data.Char
import Text.Regex.Applicative

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> psym isDigit <*> sym '.' <*> psym isDigit
parseNonDot = (:[]) <$> psym ('.' /=)
parseSentence = fmap concat . many $ parseDigitsWithDecimalPoint <|> parseNonDot
match (sepBy parseSentence (sym '.')) "One. Two. a 3.5 b. cde."
-- Just ["One"," Two"," a 3.5 b"," cde",""]

With Text.Megaparsec:

{-# LANGUAGE FlexibleContexts, TypeFamilies #-}
import Data.Char
import Text.Megaparsec

parseDigitsWithDecimalPoint = (\a b c -> [a,b,c]) <$> satisfy isDigit <*> single '.' <*> satisfy isDigit
parseNonDot = (:[]) <$> anySingleBut '.'
parseSentence = fmap concat . many $ try parseDigitsWithDecimalPoint <|> parseNonDot
parseMaybe (sepBy parseSentence (single '.')) "One. Two. a 3.5 b. cde."
-- Just ["One"," Two"," a 3.5 b"," cde",""]

Completely by hand:

import Data.Char
import Data.List.NonEmpty

foo = go id where
  go f "" = f "" :| []
  go f ('.':xs) = f "" <| go id xs
  go f (x1:'.':x2:xs) | isDigit x1 && isDigit x2 = go (f . (x1:) . ('.':) . (x2:)) xs
  go f (x:xs) = go (f . (x:)) xs
foo "One. Two. a 3.5 b. cde."
-- "One" :| [" Two"," a 3.5 b"," cde",""]

There has to be some way of making, or getting, a function that just takes a string, takes a regex, and splits the string. Every language except C and Haskell have this. — Andrew Koster, Jul 02 '20 at 16:47

Diogo Rocha · Answer 2 · 2020-01-19T17:38:35.827

-1

Use Text.Regex.Posix with this regular expression \.(?![0-9]).

Something like:

Prelude> :m +Text.Regex.Posix
Prelude Text.Regex.Posix> getAllTextMatches $ "This is sentence one. This is sentence two." =~ "\.(?![0-9])" :: [String]
["This is sentence one", "This is sentence two"]

edited Jan 19 '20 at 17:38

answered Jan 19 '20 at 17:31

Diogo Rocha

9,759
4
48
52

1

Is there any assumption for `:m + Text.Regex.Posix` to work? In my case I get an error, so I guess I should first install something. – Enlico Jan 19 '20 at 17:33
Getting this error: `lexical error in string/character literal at character '.'` – TransmissionsDev Jan 19 '20 at 17:40
That error is because the backslash needs to be doubled there. However, there's a bigger problem with this answer: `Text.Regex.Posix` doesn't support lookahead, so even once that's fixed, you'll just get `*** Exception: user error (Text.Regex.Posix.String died: (ReturnCode 13,"Invalid preceding regular expression"))`. – Joseph Sible-Reinstate Monica Jan 20 '20 at 01:29
This isn't "splitting" a string, it's "matching" and extracting matches. Similar but different. – Andrew Koster Jul 02 '20 at 16:46

How to split a string by '.' in Haskell but if the period is in-between two numbers eg: (2.5) then don't split?

2 Answers2