What is the best way to split a string by a delimiter functionally?

Question

I tried to write the program in Haskell that will take a string of integer numbers delimitated by comma, convert it to list of integer numbers and increment each number by 1.

For example "1,2,-5,-23,15" -> [2,3,-4,-22,16]

Below is the resulting program

import Data.List

main :: IO ()
main = do
  n <- return 1
  putStrLn . show . map (+1) . map toInt . splitByDelimiter delimiter
    $ getList n

getList :: Int -> String
getList n = foldr (++) [] . intersperse [delimiter] $ replicate n inputStr

delimiter = ','

inputStr = "1,2,-5,-23,15"

splitByDelimiter :: Char -> String -> [String]
splitByDelimiter _ "" = []
splitByDelimiter delimiter list =
  map (takeWhile (/= delimiter) . tail)
    (filter (isPrefixOf [delimiter])
       (tails
           (delimiter : list)))

toInt :: String -> Int
toInt = read

The most hard part for me was programming of function splitByDelimiter that take a String and return list of Strings

"1,2,-5,-23,15" -> ["1","2","-5","-23","15"]

Thought it is working, I am not happy with the way it is written. There are a lot of parentheses, so it looks Lisp like. Also the algorithm is somewhat artificial:

Prepend delimiter to beginning of string ",1,2,-5,-23,15"
Generate list of all tails [",1,2,-5,-23,15", "1,2,-5,-23,15", ",2,-5,-23,15", .... ]
Filter and left only strings that begins with delimiter [",1,2,-5,-23,15", ",2,-5,-23,15", .... ]
Drop first delimiter and take symbols until next delimiter will be met ["1", "2", .... ]

So the questions are:

How I can improve function splitByDelimiter?

Can I remove prepend and drop of delimiter and make direct split of string?

How I can rewrite the function so there will be less parentheses?

May be I miss something and there are already standard function with this functionality?

`foldr (++) []` is otherwise known as `concat`, `putStrLn . show` is otherwise known as `print`. Also, `n <- return 1` is a little odd; you can just do `let n = 1` and avoid wrapping and unwrapping the monad. — pat, Feb 27 '13 at 02:36
possible duplicate of [How to split a string in Haskell?](http://stackoverflow.com/questions/4978578/how-to-split-a-string-in-haskell) — Norman Ramsey, Jul 02 '14 at 15:39

score 37 · Accepted Answer · answered Dec 21 '10 at 21:19

37

Doesn't Data.List.Split.splitOn do this?

answered Dec 21 '10 at 21:19

Mikel

24,855
8
65
66

12

Whereas this package is not part of the basic install (Haskell Platform), I think it tends to get overlooked. – Daniel Pratt Dec 21 '10 at 21:27
`splitOneOf` is a generally more useful function, especially if you need to take arbitrary whitespace into account. – Andrew Koster Jul 02 '20 at 17:01

score 25 · Answer 2 · edited May 07 '17 at 05:15

25

splitBy delimiter = foldr f [[]] 
            where f c l@(x:xs) | c == delimiter = []:l
                             | otherwise = (c:x):xs

Edit: not by the original author, but below is a more (overly?) verbose, and less flexible version (specific to Char/String) to help clarify how this works. Use the above version because it works on any list of a type with an Eq instance.

splitBy :: Char -> String -> [String]
splitBy _ "" = [];
splitBy delimiterChar inputString = foldr f [""] inputString
  where f :: Char -> [String] -> [String]
        f currentChar allStrings@(partialString:handledStrings)
          | currentChar == delimiterChar = "":allStrings -- start a new partial string at the head of the list of all strings
          | otherwise = (currentChar:partialString):handledStrings -- add the current char to the partial string

-- input:       "a,b,c"
-- fold steps:
-- first step:  'c' -> [""] -> ["c"]
-- second step: ',' -> ["c"] -> ["","c"]
-- third step:  'b' -> ["","c"] -> ["b","c"]
-- fourth step: ',' -> ["b","c"] -> ["","b","c"]
-- fifth step:  'a' -> ["","b","c"] -> ["a","b","c"]

edited May 07 '17 at 05:15

Andy White

86,444
48
176
211

answered Sep 27 '11 at 12:42

Satvik

11,238
1
38
46

1

This is brilliant; it took me way too long to understand how it works, but I love it. – ljedrz Nov 07 '13 at 19:25
Doesn't work for empty strings, though, i.e. it evaluates to `[""]` rather than `[]`. – fotNelton Jan 08 '14 at 07:15
1

I agree with @ljedrz - it took me way to long to understand, but it is brilliant! I hope you don't mind but I added a less flexible, but extremely verbose addendum to your answer to help other people understand what's happening. – Andy White May 07 '17 at 05:11
1

Minor nitpick, but this is the functionality I would expect for a `splitOn` function, not `splitBy`. For `splitBy` I would expect `splitBy fn = foldr f [[]] where f c l@(x:xs) = bool ((c:x):xs) ([]:l) $ fn c`, with the current `splitOn c` functionality recovered by `splitOn c = splitBy (==c)` – Steven Armstrong Jun 14 '17 at 21:07

HaskellElephant · Answer 3 · 2011-09-27T09:35:17.023

12

This is a bit of a hack, but heck, it works.

yourFunc str = map (+1) $ read ("[" ++ str ++ "]")

Here is a non-hack version using unfoldr:

import Data.List
import Control.Arrow(second)

-- break' is like break but removes the
-- delimiter from the rest string
break' d = second (drop 1) . break d

split :: String -> Maybe (String,String)
split [] = Nothing
split xs = Just . break' (==',') $ xs

yourFunc :: String -> [Int]
yourFunc = map ((+1) . read) . unfoldr split

edited Sep 27 '11 at 09:35

answered Dec 21 '10 at 21:44

HaskellElephant

9,819
4
38
67

Thank you. This is a good point of view. I like the way how unfoldr is used here. – sign Dec 21 '10 at 22:19
Your split is faster than splitOn by 43ns on my comp in ghci :) – CoR Jul 19 '12 at 07:10
This implementation of split function works differently than you would expect - it doesn't properly split strings with commas at the end - one "" is missing. If you want to make sure that a split function is 100% functional, it should be reversible by interspersing with the same delimiter for all permutations of a delimited string, eg. "a,b,c". – ljedrz Nov 07 '13 at 19:37

Michael Steele · Answer 4 · 2013-02-26T20:06:15.810

Just for fun, here is how you could create a simple parser with Parsec:

module Main where

import Control.Applicative hiding (many)
import Text.Parsec
import Text.Parsec.String

line :: Parser [Int]
line = number `sepBy` (char ',' *> spaces)

number = read <$> many digit

One advantage is that it's easily create a parser which is flexible in what it will accept:

*Main Text.Parsec Text.Parsec.Token> :load "/home/mikste/programming/Temp.hs"
[1 of 1] Compiling Main             ( /home/mikste/programming/Temp.hs, interpreted )
Ok, modules loaded: Main.
*Main Text.Parsec Text.Parsec.Token> parse line "" "1, 2, 3"
Right [1,2,3]
*Main Text.Parsec Text.Parsec.Token> parse line "" "10,2703,   5, 3"
Right [10,2703,5,3]
*Main Text.Parsec Text.Parsec.Token>

Minor, but could use `many1` as in `number = read <$> many1 digit` so that invalid input like "1,,2" results in a Left value instead of an exception from Prelude.read. — rob, Dec 19 '16 at 10:47

score 4 · Answer 5 · answered Dec 21 '10 at 22:47

This is application of HaskellElephant's answer to original question with minor changes

splitByDelimiter :: Char -> String -> [String]
splitByDelimiter = unfoldr . splitSingle

splitSingle :: Char -> String -> Maybe (String,String)
splitSingle _ [] = Nothing
splitSingle delimiter xs =
  let (ys, zs) = break (== delimiter) xs in
  Just (ys, drop 1 zs)

Where the function splitSingle split the list in two substrings by first delimiter.

For example: "1,2,-5,-23,15" -> Just ("1", "2,-5,-23,15")

score 2 · Answer 6 · answered Mar 03 '13 at 02:31

splitBy del str = helper del str []   
    where 
        helper _ [] acc = let acc0 = reverse acc in [acc0] 
        helper del (x:xs) acc   
            | x==del    = let acc0 = reverse acc in acc0 : helper del xs []  
            | otherwise = let acc0 = x : acc     in helper del xs acc0

score 1 · Answer 7 · answered Mar 30 '15 at 12:22

1

This code works fine use:- split "Your string" [] and replace ',' with any delimiter

split [] t = [t]
split (a:l) t = if a==',' then (t:split l []) else split l (t++[a])

answered Mar 30 '15 at 12:22

techcomp

370
3
16

1

repeated singletons appending on the right is an anti-pattern (leads to quadratic behavior). – Will Ness Apr 14 '20 at 06:49

score 1 · Answer 8 · answered Jun 28 '15 at 19:43

import qualified Text.Regex as RegExp

myRegexSplit :: String -> String -> [String]
myRegexSplit regExp theString = 
  let result = RegExp.splitRegex (RegExp.mkRegex regExp) theString
  in filter (not . null) result

-- using regex has the advantage of making it easy to use a regular
-- expression instead of only normal strings as delimiters.

-- the splitRegex function tends to return an array with an empty string
-- as the last element. So the filter takes it out

-- how to use in ghci to split a sentence
let timeParts = myRegexSplit " " "I love ponies a lot"

Lucas Moeskops · Answer 9 · 2020-02-23T23:16:22.687

1

Another without imports:

splitBy :: Char -> String -> [String]
splitBy _ [] = []
splitBy c s  =
  let
    i = (length . takeWhile (/= c)) s
    (as, bs) = splitAt i s
  in as : splitBy c (if bs == [] then [] else tail bs)

edited Feb 23 '20 at 23:16

answered Feb 23 '20 at 11:34

Lucas Moeskops

5,445
3
28
42

2

use of `length` is an anti-pattern (destroys laziness), use `span`/`break` instead; `(if bs == [] then [] else tail bs)` == `drop 1 bs`. – Will Ness Apr 14 '20 at 06:48

What is the best way to split a string by a delimiter functionally?

9 Answers9

Linked

Related