4

I have been struggling with F# and especially working with Deedle.Frame and Deedle.Series.

  1. I need to create two new columns with values that depend on the other columns.
  2. For some rows, I need to insert a new row.

My current solution is very ugly, basically filter the Frame according to each if-statement and return a new Frame with the new columns only. Then I merge this into the original Frame. This works since the cases or mutually exclusive. The case when I need to insert a new row is made such that I will reindex the new frame so that the during a merge it simply be appended at the bottom (after last row) of the original Frame.

However, since the different if-cases/patterns are many, I have to work with a lot of merging of frames and make sure that the complement of the IF-statements are handled (otherwise there will be missing values).

Input Frame -> Target Frame (row order not important)

// ****** PSEUDO CODE ******
// Create new columns as copies.
Add column ["NEWTYPE"] = ["TYPE"] 
Add column ["NEWVALUE"] = ["VALUE"]
// Iterate over rows.
foreach row in table
    IF row["GROUP"] == A
        typeParts = row["TYPE"].Split('/')
        // Swap order X/Y -> Y/X.
        IF typeParts[0] == "X" && typeParts[1] != "X"
          row["NEWTYPE"] = typeParts[1] + "/" + "X"
          row["NEWVALUE"] = row["VALUE"] / 10.0
        // Split row into two rows Z/Y -> {Z/X, Y/X}.
        ElseIF typeParts[0] != "X" && typeParts[1] != "X"
            Insert extraRow = row
            extraRow["NEWTYPE"] = typeParts[0] + "/" + "X"
            extraRow.["NEWVALUE"] = row["VALUE"] * 2.0
            row["NEWTYPE"] = typeParts[1] + "/" + "X"
            row["NEWVALUE"] = row["VALUE"] * 3.0
        ELSE
            // Do nothing, new columns already copied.
    ELSE
          // Do nothing, new columns already copied.

A Gold star for anyone able to come up with a good solution. I guess it could be solved by returning a List of Frames (since a new row might be created) and then flatten and mergeAll?

***** Here is my current ugly F# code: *****

let subA =
    inputFrame
    |> Frame.filterRowValues(fun row -> row.GetAs<string>("GROUP") = "A")

let grpSwap =
    subA 
    |> Frame.filterRowValues(fun row -> 
        let typeParts = row.GetAs<string>("TYPE").Split('/')
        typeParts.[0] = "X" && typeParts.[1] <> "X")
    |> Frame.mapRowValues(fun r -> 
        let typeParts = r.GetAs<string>("TYPE").Split('/')
        series ["NEWTYPE" => box (typeParts.[1] + "/" + typeParts.[0]); "NEWVALUE" => box (r.GetAs<float>("VALUE") / 10.0)])
    |> Frame.ofRows

let grpCopy =
    subA 
    |> Frame.filterRowValues(fun row -> 
        let typeParts = row.GetAs<string>("TYPE").Split('/')
        typeParts.[0] <> "X" && typeParts.[1] = "X")
    |> Frame.mapRowValues(fun r -> 
        series ["NEWTYPE" => box (r.GetAs<string>("TYPE")); "NEWVALUE" => box (r.GetAs<float>("VALUE"))])
    |> Frame.ofRows


let rowsToSplit =
    subA 
    |> Frame.filterRowValues(fun row -> 
        let typeParts = row.GetAs<string>("TYPE").Split('/')
        typeParts.[0] <> "X" && typeParts.[1] <> "X")

let grpSplit1 =
    rowsToSplit 
    |> Frame.mapRowValues(fun r -> 
        let typeParts = r.GetAs<string>("TYPE").Split('/')
        series ["NEWTYPE" => box (typeParts.[0] + "/" + "X"); "NEWVALUE" => box (r.GetAs<float>("VALUE") * 2.0)])
    |> Frame.ofRows

let grpSplit2 =
    rowsToSplit
    |> Frame.mapRowValues(fun r -> 
        let typeParts = r.GetAs<string>("TYPE").Split('/')
        series ["NEWTYPE" => box (typeParts.[1] + "/" + "X"); "NEWVALUE" => box (r.GetAs<float>("VALUE") * 3.0)])
    |> Frame.ofRows

let grpAComplement =
    inputFrame 
    |> Frame.filterRowValues(fun row -> 
        row.GetAs<string>("GROUP") <> "A")
    |> Frame.mapRowValues(fun r -> 
        series ["NEWTYPE" => box (r.GetAs<string>("TYPE")); "NEWVALUE" => box (r.GetAs<float>("VALUE"))])
    |> Frame.ofRows


let outputFrame = 
    let final0 = Frame.mergeAll([inputFrame; grpSwap; grpCopy; grpSplit1; grpAComplement])
    let appendFromIndex = (final0.RowCount)
    let appendToIndex = appendFromIndex + (grpSplit2.RowCount-1)
    let newRow = grpSplit2 |> Frame.merge rowsToSplit
    newRow |> Frame.indexRowsWith [appendFromIndex..appendToIndex] |> Frame.merge final0
  • write a `fromFrame :: Frame -> Source` function and a `toFrame :: Target -> Frame` function. and put them on the start of the process I described below and at the end of it `xs |> List.map fromFrame >> transform |> List.concat |> List.map toFrame` – robkuz Jan 18 '17 at 14:01

1 Answers1

4

I would try somthing like this

first define a few types to make the stuff easier to handle

type Group = 
    | A
    | B
    | C

type Source = {
    Group: Group
    Typ: string
    Value: float
}          

type Target = {
    Group: Group
    Typ: string
    Value: float
    NType: string
    NValue: float
}          

create initialize your initial list

let xs : List<Source> = createFromWhereEver()

define the transformation function.
The trick is that this function returns a list of Target objects. Either with one item or with 2 items in it.

let transform (x:Source) : List<Target> =
    if x.Group = A then
        let init x ntype nvalue =
            {
                Group = x.Group
                Typ = x.Typ
                Value = x.Value
                NType = ntype
                NValue = nvalue
            }
        let tp0 :: tp1 :: _ = x.Typ.split('/')
        // Swap order X/Y -> Y/X.
        if tp0 = "X" && tp1 <> "X" then
            [init x (tp1 + "/X") (x.value / 10)]
        // Split row into two rows Z/Y -> {Z/X, Y/X}.
        elif tp0 <> "X" && tp1 <> "X"
            [
                init x (tp0 + "/X") (x.value * 2)
                init x (tp1 + "/X") (x.value * 3)
            ]
        else
            [x]
    else
        [x]

and finally pump your list and sources thru map and finally concat those lists

xs
|> List.map transform
//will give you a List<List<Target>>
|> List.concat
robkuz
  • 9,488
  • 5
  • 29
  • 50
  • Nice! It might be that I don't know the exact number/names of columns, except the ones used above. It would be nice if those could simply follow along. I suppose this would change the method quite a lot. I suppose this would make the solution much uglier? :) – Pontus Hultkrantz Jan 18 '17 at 14:02
  • 1
    Why would that be? of course you could work directly with `Frame` objects (if there is such a thing - I dont know Deedle). Or create some sort of Adapter type to layer upon `frame` objects – robkuz Jan 18 '17 at 14:10
  • 1
    I think the real trick on my solution is simly this: instead in place editing use some variant of `map` and a processing function that instead of returning a `TYPE` always returns a `List`. The list contaning either 1 element (the original one) or more elements depending on your checks. and later concat that list. – robkuz Jan 18 '17 at 14:18