194

Is there a standard way to split a string in Haskell?

lines and words work great from splitting on a space or newline, but surely there is a standard way to split on a comma?

I couldn't find it on Hoogle.

To be specific, I'm looking for something where split "," "my,comma,separated,list" returns ["my","comma","separated","list"].

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270
  • 27
    I would really like to such a function in a future release of `Data.List` or even `Prelude`. It's so common and nasty if not available for code-golf. – fuz Feb 12 '11 at 15:08

15 Answers15

193

Remember that you can look up the definition of Prelude functions!

http://www.haskell.org/onlinereport/standard-prelude.html

Looking there, the definition of words is,

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                      "" -> []
                      s' -> w : words s''
                            where (w, s'') = break Char.isSpace s'

So, change it for a function that takes a predicate:

wordsWhen     :: (Char -> Bool) -> String -> [String]
wordsWhen p s =  case dropWhile p s of
                      "" -> []
                      s' -> w : wordsWhen p s''
                            where (w, s'') = break p s'

Then call it with whatever predicate you want!

main = print $ wordsWhen (==',') "break,this,string,at,commas"
Steve
  • 8,153
  • 9
  • 44
  • 91
159

There is a package for this called split.

cabal install split

Use it like this:

ghci> import Data.List.Split
ghci> splitOn "," "my,comma,separated,list"
["my","comma","separated","list"]

It comes with a lot of other functions for splitting on matching delimiters or having several delimiters.

Alex
  • 8,093
  • 6
  • 49
  • 79
Jonno_FTW
  • 8,601
  • 7
  • 58
  • 90
  • 9
    Cool. I wasn't aware of this package. This is *the* ultimate split package as it gives much control over the operation (trim space in results, leave separators in result, remove consecutive separators, etc...). There are so many ways of splitting lists, it is not possible to have in single `split` function that will answer every needs, you really need that kind of package. – gawi Feb 12 '11 at 20:37
  • 1
    otherwise if external packages are acceptable, MissingH also provides a split function: http://hackage.haskell.org/packages/archive/MissingH/1.2.0.0/doc/html/Data-List-Utils.html#v:split That package also provides plenty of other "nice-to-have" functions and I find that quite some packages depend on it. – Emmanuel Touzery Dec 13 '12 at 10:44
  • 47
    The split package is now apart of the haskell platform as of most recent release. – The Internet Jul 06 '13 at 17:12
  • @dave How can I import and use the split package, then? – Anderson Green Sep 10 '13 at 03:05
  • 14
    import Data.List.Split (splitOn) and go to town. splitOn :: Eq a => [a] -> [a] -> [[a]] – The Internet Sep 10 '13 at 04:50
  • The link is dead, I tried updating it to what it reported the link should be but still returned the same page. – ChrisF May 26 '16 at 13:03
  • 2
    @RussAbbott the split package is included in the Haskell Platform when you download it (https://www.haskell.org/platform/contents.html), but it is not automatically loaded when building your project. Add `split` to the `build-depends` list in your cabal file, e.g. if your project is called hello, then in the `hello.cabal` file below the `executable hello` line put a line like ` build-depends: base, split` (note two space indent). Then build using the `cabal build` command. Cf. https://www.haskell.org/cabal/users-guide/developing-packages.html#example-a-package-containing-executable-programs – expz Dec 14 '19 at 18:54
46

If you use Data.Text, there is splitOn:

http://hackage.haskell.org/packages/archive/text/0.11.2.0/doc/html/Data-Text.html#v:splitOn

This is built in the Haskell Platform.

So for instance:

import qualified Data.Text as T
main = print $ T.splitOn (T.pack " ") (T.pack "this is a test")

or:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text as T
main = print $ T.splitOn " " "this is a test"
Emmanuel Touzery
  • 9,008
  • 3
  • 65
  • 81
22

Use Data.List.Split, which uses split:

[me@localhost]$ ghci
Prelude> import Data.List.Split
Prelude Data.List.Split> let l = splitOn "," "1,2,3,4"
Prelude Data.List.Split> :t l
l :: [[Char]]
Prelude Data.List.Split> l
["1","2","3","4"]
Prelude Data.List.Split> let { convert :: [String] -> [Integer]; convert = map read }
Prelude Data.List.Split> let l2 = convert l
Prelude Data.List.Split> :t l2
l2 :: [Integer]
Prelude Data.List.Split> l2
[1,2,3,4]
antimatter
  • 3,240
  • 2
  • 23
  • 34
20

Without importing anything a straight substitution of one character for a space, the target separator for words is a space. Something like:

words [if c == ',' then ' ' else c|c <- "my,comma,separated,list"]

or

words let f ',' = ' '; f c = c in map f "my,comma,separated,list"

You can make this into a function with parameters. You can eliminate the parameter character-to-match my matching many, like in:

 [if elem c ";,.:-+@!$#?" then ' ' else c|c <-"my,comma;separated!list"]
fp_mora
  • 718
  • 6
  • 11
  • That does not distinguish between new added spaces and spaces that were here originally, so for `"my,comma separated,list"` it will see 4 parts instead of 3 as intended. – Yuri Kovalenko Jul 23 '21 at 08:15
  • @Yuri Kovalenko `words` does; try `words [if c == ',' then ' ' else c|c <- "my, comma, separated, list "]` – fp_mora Jul 23 '21 at 15:09
19

In the module Text.Regex (part of the Haskell Platform), there is a function:

splitRegex :: Regex -> String -> [String]

which splits a string based on a regular expression. The API can be found at Hackage.

evilcandybag
  • 1,942
  • 17
  • 17
14

Try this one:

import Data.List (unfoldr)

separateBy :: Eq a => a -> [a] -> [[a]]
separateBy chr = unfoldr sep where
  sep [] = Nothing
  sep l  = Just . fmap (drop 1) . break (== chr) $ l

Only works for a single char, but should be easily extendable.

sshine
  • 15,635
  • 1
  • 41
  • 66
fuz
  • 88,405
  • 25
  • 200
  • 352
13
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s

E.g.

split ';' "a;bb;ccc;;d"
> ["a","bb","ccc","","d"]

A single trailing delimiter will be dropped:

split ';' "a;bb;ccc;;d;"
> ["a","bb","ccc","","d"]
8

I find this simpler to understand:

split :: Char -> String -> [String]
split c xs = case break (==c) xs of 
  (ls, "") -> [ls]
  (ls, x:rs) -> ls : split c rs
mxs
  • 99
  • 2
  • 5
6

I started learning Haskell yesterday, so correct me if I'm wrong but:

split :: Eq a => a -> [a] -> [[a]]
split x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if y==x then 
            func x ys ([]:(z:zs)) 
        else 
            func x ys ((y:z):zs)

gives:

*Main> split ' ' "this is a test"
["this","is","a","test"]

or maybe you wanted

*Main> splitWithStr  " and " "this and is and a and test"
["this","is","a","test"]

which would be:

splitWithStr :: Eq a => [a] -> [a] -> [[a]]
splitWithStr x y = func x y [[]]
    where
        func x [] z = reverse $ map (reverse) z
        func x (y:ys) (z:zs) = if (take (length x) (y:ys)) == x then
            func x (drop (length x) (y:ys)) ([]:(z:zs))
        else
            func x ys ((y:z):zs)
Robin Begbie
  • 363
  • 1
  • 3
  • 6
  • 1
    I was looking for a built-in `split`, being spoiled by languages with well-developed libraries. But thanks anyway. – Eric Wilson Jun 11 '12 at 09:46
  • 4
    You wrote this in June, so I assume you've moved on in your journey :) As an exercise, trying rewriting this function without reverse or length as use of these functions incur an algorithmic complexity penalty and also prevent application to an infinite list. Have fun! – Tony Morris Oct 22 '12 at 02:37
5

I don’t know how to add a comment onto Steve’s answer, but I would like to recommend the
  GHC libraries documentation,
and in there specifically the
  Sublist functions in Data.List

Which is much better as a reference, than just reading the plain Haskell report.

Generically, a fold with a rule on when to create a new sublist to feed, should solve it too.

Evi1M4chine
  • 6,992
  • 1
  • 24
  • 18
4

Example in the ghci:

>  import qualified Text.Regex as R
>  R.splitRegex (R.mkRegex "x") "2x3x777"
>  ["2","3","777"]
Andrew
  • 36,676
  • 11
  • 141
  • 113
  • 1
    Please, don’t use regular expressions to split strings. Thank you. – kirelagin Mar 28 '18 at 13:06
  • @kirelagin, why this comment? I'm learning Haskell, and I'd like to know the rational behind your comment. – Enlico Jan 19 '20 at 17:26
  • @Andrey, is there a reason why I cannot even run the first line in my `ghci`? – Enlico Jan 19 '20 at 17:26
  • 1
    @EnricoMariaDeAngelis Regular expressions are a powerful tool for string matching. It makes sense to use them when you are matching something non-trivial. If you just want to split a string on something as trivial as another fixed string, there is absolutely no need to use regular expressions – it will only make the code more complex and, likely, slower. – kirelagin Jan 21 '20 at 01:43
  • 3
    "Please, don’t use regular expressions to split strings." WTF, why not??? Splitting a string with a regular expression is a perfectly reasonable thing to do. There are lots of trivial cases where a string needs to be split but the delimiter isn't always exactly the same. – Andrew Koster Jul 02 '20 at 16:01
3

In addition to the efficient and pre-built functions given in answers I'll add my own which are simply part of my repertory of Haskell functions I was writing to learn the language on my own time:

-- Correct but inefficient implementation
wordsBy :: String -> Char -> [String]
wordsBy s c = reverse (go s []) where
    go s' ws = case (dropWhile (\c' -> c' == c) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> c' /= c) rem)) ((takeWhile (\c' -> c' /= c) rem) : ws)

-- Breaks up by predicate function to allow for more complex conditions (\c -> c == ',' || c == ';')
wordsByF :: String -> (Char -> Bool) -> [String]
wordsByF s f = reverse (go s []) where
    go s' ws = case ((dropWhile (\c' -> f c')) s') of
        "" -> ws
        rem -> go ((dropWhile (\c' -> (f c') == False)) rem) (((takeWhile (\c' -> (f c') == False)) rem) : ws)

The solutions are at least tail-recursive so they won't incur a stack overflow.

0

I am far late but would like to add it here for those interested, if you're looking for a simple solution without relying on any bloated packages:

split :: String -> String -> [String]
split _ "" = []
split delim str =
  split' "" str []
  where
    dl = length delim

    split' :: String -> String -> [String] -> [String]
    split' h t f
      | dl > length t = f ++ [h ++ t]
      | delim == take dl t = split' "" (drop dl t) (f ++ [h])
      | otherwise = split' (h ++ take 1 t) (drop 1 t) f
Microtribute
  • 962
  • 10
  • 24
  • 1
    Oh come on... Ultimately what matters is not that something is liked by thousands of people. I am NOT forcing you to use it. It's ONLY there for those interested. Sounds like you're none of them. – Microtribute Jul 09 '21 at 20:26
  • You say "liked by" -- I say "battle tested". It's fine if you enjoy sharing it. My question was for the standard way to do it, and that has been answersd. – Eric Wilson Jul 09 '21 at 21:32
  • 2
    Haskell does not come with the split function out of the box. Remember you asked a function that splits a string by a string (String -> String -> [String]), not by a char (Char->String->[String]). You have to install the `split` package, which is NOT a standard way EITHER. Installing the `split` package will also include a bunch of redundant functions. You only asked for a `split` function, and I gave exactly that to you and NO MORE. – Microtribute Sep 07 '21 at 19:40
0

So many answers, but I don't like them all. I don't know Haskell actually, but I wrote much shorter and (as I think) cleaner version for 5 minutes;

splitString :: Char -> [Char] -> [[Char]]
splitString _ [] = []
splitString sep str = 
    let (left, right) = break (==sep) str 
    in left : splitString sep (drop 1 right)
Pavel.Zh
  • 437
  • 3
  • 15