I have a CSV file with two columns, text and count. The goal is to transform the file from this:
some text once,1
some text twice,2
some text thrice,3
To this:
some text once,1
some text twice,1
some text twice,1
some text thrice,1
some text thrice,1
some text thrice,1
repeating each line count times and spreading the count over that many lines.
This seems to me like a good candidate for Seq.unfold, generating the additional lines, as we read the file. I have the following generator function:
let expandRows (text:string, number:int32) =
if number = 0
then None
else
let element = text // "element" will be in the generated sequence
let nextState = (element, number-1) // threaded state replacing looping
Some (element, nextState)
FSI yields a the following function signature:
val expandRows : text:string * number:int32 -> (string * (string * int32)) option
Executing the following in FSI:
let expandedRows = Seq.unfold expandRows ("some text thrice", 3)
yields the expected:
val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"]
The question is: how do I plug this into the context of a larger ETL pipeline? For example:
File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.unfold expandRows // type mismatch here
|> Seq.iter outFile.WriteLine
The error below is on expandRows in the context of the pipeline.
Type mismatch.
Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option'
but given a 'string * int32 -> (string * (string * int32)) option'
The type 'seq<string * int 32>' does not match the type 'string * int32'
I was expecting that expandRows was returning seq of string, as in my isolated test. As that is neither the "Expecting" or the "given", I'm confused. Can someone point me in the right direction?
A gist for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498