2

With the following schema (defined below). I can use map reduce to aggregate the delivered_count field for all days (which is an embedded array inside the campaign document).

  {
    campaign_id: 1,
    status: 'running',
    dates: {
      '20130926' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130927' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130928' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      }
    }
  }

The code below parses through field asia_regions outputs the value of field clicked_count => 30 (combined value of all data)

$rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.map { |key|
    doc.get_field(key)[:asia_region][:clicked_count].default(0)
  }.reduce { |left, right|
    left+right
  }
}.run

Is it possible to run the code above but against multiple regions? This way I can run one query which will return multiple sums. The output i'm trying to achieve is something similar to the pseudo result below.

[{ asia_region: {clicked_count: 30}}, {america_region: {clicked_count: 30} }]
levelone
  • 2,289
  • 3
  • 16
  • 17

2 Answers2

1

I'm a bit confused about the code that you posted. Why is everything inside of a filter? To output something like what you want do the following:

regions = [:pacific_region, :america_region, ...]
reg_clicks = r.table(:daily_stat_campaigns).concat_map { |row|
                 row[:dates]
                 .coerce_to("ARRAY")
                 .map{ |date| date[0] }
                 .pluck(regions)
                 .coerce_to("ARRAY")
              }

You can now run reg_clicks and it should look something like this:

$ reg_clicks.run()
[[:asia_region, {clicked_count: 30}], [:etc_region, {clicked_count: 30}], ...]

Now we need to do one last transformation to aggregate it:

$ aggregate = reg_clicks.map{ |reg|
                  {reg: reg[0], clicked_count: reg[0][:clicked_count]}
              }
              .group_by(:reg, r.sum(:clicked_count))

this will give you output that looks like:

[{group: :asia_region, reduction: 150} ...]

if you want it too look exactly like what you want then you can apply a final transformation:

aggregate.map{ |row|
    [row[:group], row[:reduction]]
}
.coerce_to("OBJECT")

These queries would definitely be a bit nicer if you normalized the data a bit. Breaking things out into a 2 more tables called :dates and :region_clicks that looked like so:

#dates
{
    id: 0
    campaign_id: 1
    date: '20130927'
    delivered: 1,
    failed: 1,
    queued: 1,
    clicked: 1,
    males_count: 1
}

#region_clicks
{
    region: "asia_region",
    click_count: 30,
    date_id: 0
}

Then your query would be as simple as:

r.table(:region_clicks).group_by(:region, r.sum(:click_count)).run()
Joe Doliner
  • 2,058
  • 2
  • 15
  • 19
  • 1
    Hi Joe, I believe Marc is using `filter` so that he can aggregate the days of only one campaign. Second, he is wanting to embed, rather than reference. Unless you think the embedded array of dates might be too large and cause some kind of a performance impact. The intention here is only to store no more than 90 days per campaign. What are your thoughts? – Christian Fazzini Sep 28 '13 at 02:13
  • Joe, the code you suggested. Returns the following error: `RethinkDB::RqlRuntimeError: Cannot perform pluck on a non-object non-sequence "20130926".` – Christian Fazzini Sep 28 '13 at 11:44
1

This seems to work:

require 'awesome_print' # For better readability on output

regions = [:pacific_region, :america_region]
reg_clicks = $rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.concat_map { |key|
    doc
    .get_field(key)
    .pluck(regions)
    .coerce_to("ARRAY")
  }
}
ap reg_clicks.run

Will output something like: [["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}]]

aggregate = reg_clicks.map { |reg|
  { reg: reg[0], clicked_count: reg[1][:clicked_count] }
}
ap aggregate.run

Will output: [{"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}]

ap aggregate.group_by(:reg, $rethinkdb_rql.sum(:clicked_count)).run

Outputs: [{"reduction"=>30, "group"=>{"reg"=>"america_region"}}, {"reduction"=>30, "group"=>{"reg"=>"pacific_region"}}]

Christian Fazzini
  • 19,613
  • 21
  • 110
  • 215