2

Short Version: What is the source of [global]#[no method] in the Ruby profiler output below, and is there any way to eliminate it or reduce its time? It's taking up over 50% of the total execution time for my method and it's the only part of the profiler results that I can't account for.

Long Version: I'm using the ruby-prof gem to profile the Ruby code below. The method find_median_sorted_arrays takes two sorted arrays, and returns their median as a float (hence the name). The problem comes from a coding challenge website, and I decided to run a profiler on my solution because apparently 80% of the submitted Ruby solutions run faster than mine.

Note that the Ruby version is ruby 2.3.3p222 (2016-11-21 revision 56859) [x86_64-darwin15]:

def find_median_sorted_arrays(nums1, nums2)
  sorted_array = [nil] * (nums1.length + nums2.length)
  nums1_counter, nums2_counter = 0, 0
  sorted_array.each_with_index do |num, index|
    if nums2_counter >= nums2.length || (nums1[nums1_counter] && nums1[nums1_counter] < nums2[nums2_counter])
      sorted_array[index] = nums1[nums1_counter]
      nums1_counter += 1
    else
      sorted_array[index] = nums2[nums2_counter]
      nums2_counter += 1
    end
  end

  return median(sorted_array)
end

def median(array)
  len = array.length
  (array[(len - 1) / 2] + array[len / 2]) / 2.0
end

nums1 = [1,2]
nums2 = [3,4]

RubyProf.start
find_median_sorted_arrays(nums1, nums2)
result = RubyProf.stop

printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)

The output is as follows:

 %self      total      self      wait     child     calls  name
 58.56      0.000     0.000     0.000     0.000        1   [global]#[no method]
 11.60      0.000     0.000     0.000     0.000        1   Array#each
 11.60      0.000     0.000     0.000     0.000        1   Object#find_median_sorted_arrays
  6.63      0.000     0.000     0.000     0.000        1   Array#*
  4.42      0.000     0.000     0.000     0.000        1   Fixnum#/
  4.42      0.000     0.000     0.000     0.000        1   Object#median
  2.76      0.000     0.000     0.000     0.000        1   Enumerable#each_with_index

As I mentioned in the short version, [global]#[no method] is taking up over 50% of the total execution time for my method and it's the only part of the profiler results that I can't account for. Where does this come from, and is there any way to eliminate it or reduce its time?

Richie Thomas
  • 3,073
  • 4
  • 32
  • 55
  • Tip: You can allocate an array of size *n* `nil` values with `Array.new(n, nil)` – tadman Feb 16 '18 at 05:59
  • @tadman Is that somehow better? – Stefan Pochmann Feb 16 '18 at 10:37
  • @StefanPochmann Might be in terms of performance, which seems to be the goal here. – tadman Feb 16 '18 at 17:42
  • 1
    @tadman In my tests, `[nil] * n` was faster... – Stefan Pochmann Feb 16 '18 at 17:43
  • @StefanPochmann That's really odd, but good to know. – tadman Feb 16 '18 at 17:46
  • To be clear, my goal is twofold- to improve performance and to understand what `[global]#[no method]` is. If I had to pick one, I'd say understanding `[global]#[no method]` is my priority. That will (hopefully) have the side effect of telling me whether it's something that can be eliminated. – Richie Thomas Feb 16 '18 at 18:13
  • 1
    @tadman I tested some more... for n=1000 and smaller, `Array.new(n, nil)` was faster. For n=10000, both were about the same speed. For n=100000, `[nil] * n` was faster. And overall, `Array.new(n)` seemed to be slightly faster than `Array.new(n, nil)`, but I'm not sure. – Stefan Pochmann Feb 16 '18 at 18:30
  • @RichieThomas I'd say the main reason your code is slow is because you're doing it wrong. You take linear time, but this problem should be solved in logarithmic time. Also, try it with larger arrays, then perhaps the percentage of `[global]#[no method]` will drop to insignificance. – Stefan Pochmann Feb 16 '18 at 18:31
  • @StefanPochmann I'll try with larger sample sizes, but how is it possible to combine the two arrays in faster than O(n+m) time, i.e. without looking at every item in both arrays? – Richie Thomas Feb 16 '18 at 18:51
  • As a follow-up, I see from the following link that others have asked a similar question and have been told it's not possible: https://stackoverflow.com/questions/35662619/merging-two-sorted-arrays-with-olognm-worst-case – Richie Thomas Feb 16 '18 at 18:57
  • @RichieThomas You shouldn't "combine" them. Do a binary search using both arrays. – Stefan Pochmann Feb 16 '18 at 18:58
  • But the goal is to find the median of the two sorted arrays, not each individual one. Here's the problem description if it'll help: "There are two sorted arrays nums1 and nums2 of size m and n respectively. Find the median of the two sorted arrays. The overall run time complexity should be O(log (m+n))." – Richie Thomas Feb 16 '18 at 18:59
  • I know. And for individual medians, it wouldn't be logarithmic time but *constant* time. – Stefan Pochmann Feb 16 '18 at 19:01
  • It seems there's something I don't understand about the solution you proposed. I know you're saying I should use binary search to get the median of each individual array, right? And then do what with them? I ask because the medians of the 2 smaller arrays aren't necessarily related to the median of the entire collection. Example- 2 arrays [1,1,1,2,3,5] and [1,2,4,6,9]. Median #1 = 1.5, median #2 = 4. The median of the whole collection is 2 when the arrays are combined ([1, 1, 1, 1, 2, 2, 3, 4, 5, 6, 9]). If I'm misunderstanding you somewhere, please let me know. – Richie Thomas Feb 16 '18 at 22:31
  • As a side note, you were right about larger sample sizes- when adding 10,000 numbers to one array and 1 million to another, the time significance of `[global]#[no method]` dropped down to .01% of total. – Richie Thomas Feb 16 '18 at 22:48
  • No, I said do **a** binary search using both arrays. One. Not two. To find the one overall median, not to find the two medians of the two arrays. If you want more explanation, you can find one I wrote here: https://leetcode.com/problems/median-of-two-sorted-arrays/discuss/2579/6-lines-O(log(min(mn)))-Ruby. Good to see that `[global]#[no method]` indeed became insignificant with larger arrays. Really your case was so tiny that I'd say the result was pretty much meaningless. – Stefan Pochmann Feb 17 '18 at 09:53

0 Answers0