I'm busy working through an ETL pipeline, but for this particular problem, I need to take a table of data, and turn each column into a set - that is, a unique array.
I'm struggling to wrap my head around how I would accomplish this within the Kiba framework.
Here's the essence of what I'm trying to achieve:
Source:
[
{ dairy: "Milk", protein: "Steak", carb: "Potatoes" },
{ dairy: "Milk", protein: "Eggs", carb: "Potatoes" },
{ dairy: "Cheese", protein: "Steak", carb: "Potatoes" },
{ dairy: "Cream", protein: "Chicken", carb: "Potatoes" },
{ dairy: "Milk", protein: "Chicken", carb: "Pasta" },
]
Destination
{
dairy: ["Milk", "Cheese", "Cream"],
protein: ["Steak", "Eggs", "Chicken"],
carb: ["Potatoes", "Pasta"],
}
Is something like this a) doable in Kiba, and b) even advisable to do in Kiba?
Any help would be greatly appreciated.
Update - partially solved.
I've found a partial solution. This transformer class will transform a table of rows into a hash of sets, but I'm stuck on how to get that data out using an ETL Destination. I suspect I'm using Kiba in a way in which it's not intended to be used.
class ColumnSetTransformer
def initialize
@col_set = Hash.new(Set.new)
end
def process(row)
row.each do |col, col_val|
@col_set[col] = @col_set[col] + [col_val]
end
@col_set
end
end