3

I have this data:

members = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

And want to generate an array like:

["foo@example.org", "bar@example.org", "exx@example.org"]

Currently I'm using:

members['data'].collect { |h| h['email'] }
  1. Is there a more efficient way to achieve it in regards to performance?
  2. Is there an even shorter way to achieve it?

I have Rails available.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
user569825
  • 2,369
  • 1
  • 25
  • 45

3 Answers3

4

Additionally to the other answers, I'll add the if you're able to construct the Hash using symbols as keys you can have a performance gain when collecting the values, for instance:

require 'benchmark'

members_without_sym = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

members_with_sym = {:total=>3, :data=>[
  {:email=> "foo@example.org", :timestamp => "2013-03-16 01:11:01"},
  {:email=> "bar@example.org", :timestamp => "2013-03-16 02:07:30"},
  {:email=> "exx@example.org", :timestamp=> "2013-03-16 03:06:24"}
]}

Benchmark.bm(1) do |algo|
  algo.report("Without symbol"){
    2_000_000.times do 
       members_without_sym['data'].collect { |h| h['email'] }
    end   
  }
  algo.report("With symbol"){
    2_000_000.times do 
      members_with_sym[:data].collect { |h| h[:email] }      
    end
  }
end

Results:

        user     system      total        real
Without symbol  2.260000   0.000000   2.260000 (  2.254277)
With symbol  0.880000   0.000000   0.880000 (  0.878603)
fmendez
  • 7,250
  • 5
  • 36
  • 35
  • The data comes from the MailChimp API, so I guess we're unable to take advantage of **symbols** in that case. Thanks also for pointing out the **Benchmark** class - will be useful in the future! +1 – user569825 Mar 16 '13 at 14:35
  • 2
    While lookup by symbol vs lookup by string is indeed significantly faster, your benchmark doesn't prove it. There is significant jitter in time measurement and a single iteration of something you're trying to benchmark like this is unlikely to be meaningful (notice how your user times are identical and 0). As a general rule you should try timing N iterations of the computations under benchmark where N is large enough to bring the overall timing over a second. On my machine, that's on the order of 2M iterations: `2_000_000.times{ members_without_sym['data'].collect { |h| h['email'] } }` – dbenhur Mar 17 '13 at 00:46
3

Other than optimising the h['email'] part into native extensions, I cannot see how you could make the above example more efficient. The efficiency gain of doing so would be tiny for the example size of data set, and much less than optimising I/O of fetching/parsing this data in the first place I'd suspect.

Depending on your data source, having the hash keys as labels, and not strings, is a common Ruby idiom, and also more efficient in terms of memory use. This is potentially a larger gain in efficiency, and might be worth it provided you don't have to put a large amount of effort in to convert the data (e.g. you can somehow change the nature of the given data structure from your data source, without needing to convert the hash just to query it once!)

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Neil Slater
  • 26,512
  • 6
  • 76
  • 94
  • The data comes from the MailChimp API, so I guess we're unable to take advantage of **symbols** in that case. Anyway +1 – user569825 Mar 16 '13 at 14:34
2
members = {"total"=>3, "data"=>[
  {"email"=>"foo@example.org", "timestamp"=>"2013-03-16 01:11:01"},
  {"email"=>"bar@example.org", "timestamp"=>"2013-03-16 02:07:30"},
  {"email"=>"exx@example.org", "timestamp"=>"2013-03-16 03:06:24"}
]}

temp = members["data"].map{|x|x["email"]}

gives you ["foo@example.org", "bar@example.org", "exx@example.org"]

Difference between map and collect in Ruby?

--

Maybe Structs would improve performance

Record = Struct.new(:email, :timestamp)
members = {"total"=>3, "data"=>[
  Record.new("foo@example.org","2013-03-16 01:11:01"),
  Record.new("bar@example.org","2013-03-16 02:07:30"),
  Record.new("exx@example.org","2013-03-16 03:06:24")
]}

temp = members["data"].map(&:email)

http://blog.rubybestpractices.com/posts/rklemme/017-Struct.html

Community
  • 1
  • 1
ajt
  • 553
  • 8
  • 25