29

A quick question that may be more of a rant (but I hope to be enlightened instead).

In F# a string is compatible with Seq such that "abcd" |> Seq.map f will work on a string.

This is a brilliant facility for working with strings, for example to take the first 5 chars from a string:

"abcdef01234567" |> Seq.take 5

Or removing duplicate characters:

"abcdeeeeeee" |> Seq.distinct

The problem being that once you have the char seq result, it becomes extremely awkward to convert this back to a string again, String.concat "" requires that the members are strings, so I end up doing this a lot:

"abcdef01234567" 
|> Seq.take 5
|> Seq.map string
|> String.concat ""

So much so that I have a function I use in 90% of my projects:

let toString : char seq -> string = Seq.map string >> String.concat ""

I feel this is over the top, but everywhere I look to find an alternative I am met with heinous things like StringBuilder or inlining a lambda and using new:

"abcdef01234567" 
|> Seq.take 5
|> Seq.toArray 
|> fun cs -> new string (cs) (* note you cannot just |> string *)

My (perhaps crazy) expectation that I would like to see in the language is that when Seq is used on string, the type signature from the resulting expression should be string -> string. Meaning, what goes in is what comes out. "abcd" |> Seq.take 3 = "abc".

Is there a reason my expectations of high level string manipulation is mistaken in this case?

Does anyone have a recommendation for approaching this in a nice manner, I feel like I must be missing something.

  • One minor improvement - you can do `System.String("aa" |> Seq.take 1 |> Seq.toArray)` which is slightly better - using `System.String` gets an implicit `new` for free – John Palmer Feb 03 '13 at 00:34
  • nice, but I really don't like breaking the workflow and placing the last expression in a function at the start, |> fun cs -> new ... feels like the only possible compromise (because I cant do let take n = Seq.take n >> Seq.toArray >> string >:( ) –  Feb 06 '13 at 08:05
  • Why not just `"abcdef01234567".Substring(0, 5)`? I would imagine that the F# String module lacks the `take` function precisely because this instance method exists in the framework. – phoog Feb 09 '13 at 06:41
  • unfortunately I don't get the composition i want with substring, often requiring what i perceive to be unncess. type signatures and use of tuples and dots, it just feels unclean –  Feb 12 '13 at 08:37

4 Answers4

29

I was just researching this myself. I found that System.String.Concat works pretty well, e.g.

"abcdef01234567" |> Seq.take 5 |> String.Concat;;

assuming that you've opened System.

Johann Hibschman
  • 1,997
  • 1
  • 16
  • 17
  • Excellent, I completely looked passed this one! sneaky! –  Jul 24 '14 at 08:31
  • 1
    Just to make really clear. There are two static concatination methods in System.String. Concat & concat - case sensitive. – Craig.C Feb 26 '19 at 09:19
12

The functions in the Seq module only deal with sequences -- i.e., when you call them with a string, they only "see" a Seq<char> and operate on it accordingly. Even if they made a special check to see if the argument was a string and took some special action (e.g., an optimized version of the function just for strings), they'd still have to return it as a Seq<char> to appease the F# type system -- in which case, you'd need to check the return value everywhere to see if it was actually a string.

The good news is that F# has built-in shortcuts for some of the code you're writing. For example:

"abcdef01234567" |> Seq.take 5

can be shortened to:

"abcdef01234567".[..4]  // Returns the first _5_ characters (indices 0-4).

Some of the others you'll still have to use Seq though, or write your own optimized implementation to operate on strings.

Here's a function to get the distinct characters in a string:

open System.Collections.Generic

let distinctChars str =
    let chars = HashSet ()
    let len = String.length str
    for i = 0 to len - 1 do
        chars.Add str.[i] |> ignore
    chars
Jack P.
  • 11,487
  • 1
  • 29
  • 34
  • Your answer makes sense, and I do like the slicing operation .[..4], but unfortunately it lacks smooth composition and usually requires a type annotation to say it is either a string or an array. I think you are right in saying that the best option is to create a specific function for the use, and perhaps module String = let distinct s = s |> Seq.distinct |> toString might make for a nice extension, although one has to wonder why it isn't already included! –  Feb 03 '13 at 00:40
  • 1
    @DavidK Many functional languages tend to lean towards minimalism in their library design since it's easy enough to stitch some little helper functions together as you need them. The idea is to keep the libraries simple and fast instead of providing a built-in function for every possible scenario. – Jack P. Feb 04 '13 at 13:05
  • that does make sense, and for an ML language I can understand the F# way for this, but I guess the whole situation makes me really jealous of how haskell handles this (type classes I guess). take 4 "test string" will return "test". I wont argue that F# should have this feature though, that can be a discussion for the language designers to have! –  Feb 06 '13 at 07:57
  • 1
    I would prefer Seq.take 5 over the .[..4] in terms of readability. – Luca Fülbier Aug 09 '16 at 22:08
7

F# has a String module which contains some of the Seq module functionality specialised for strings.

Lee
  • 142,018
  • 20
  • 234
  • 287
5

F# has gained the ability to use constructors as functions since this question was asked 5 years ago. I would use String(Char[]) to convert characters to a string. You can convert to and from an F# sequence or an F# list, but I'd probably just use the F# array module using String.ToCharArray method too.

printfn "%s" ("abcdef01234567".ToCharArray() |> Array.take 5 |> String)

If you really wanted to use a char seq then you can pipe it to a String like so:

printfn "%s" ("abcdef01234567" |> Seq.take 5 |> Array.ofSeq |> String)
Cameron Taggart
  • 5,771
  • 4
  • 45
  • 70