Find the year with the highest population (most efficient solution)

Question

Given two arrays; $births containing a list of birth years indicating when someone was born, and $deaths containing a list of death years indicating when someone died, how can we find the year on which the population was highest?

For example given the following arrays:

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];

The year on which the population was highest should be 1996, because 3 people were alive during that year, which was the highest population count of all those years.

Here's the running math on that:

| Birth | Death | Population |
|-------|-------|------------|
| 1981  |       | 1          |
| 1984  |       | 2          |
| 1984  | 1984  | 2          |
| 1991  | 1991  | 2          |
| 1996  |       | 3          |

Assumptions

We can safely assume that the year on which someone is born the population can increase by one and the year on which someone died the population can decrease by one. So in this example, 2 people were born on 1984 and 1 person died on 1984, meaning the population increased by 1 on that year.

We can also safely assume that the number of deaths will never exceed the number of births and that no death can occur when the population is at 0.

We can also safely assume that the years in both $deaths and $births will never be negative or floating point values (they're always positive integers greater than 0).

We cannot assume that the arrays will be sorted or that there won't be duplicate values, however.

Requirements

We must write a function to return the year on which the highest population occurred, given these two arrays as input. The function may return 0, false, "", or NULL (any falsey value is acceptable) if the input arrays are empty or if the population was always at 0 throughout. If the highest population occurred on multiple years the function may return the first year on which the highest population was reached or any subsequent year.

For example:

$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];

/* The highest population was 3 on 1997, 1998 and 1999, either answer is correct */

Additionally, including the Big O of the solution would be helpful.

My best attempt at doing this would be the following:

function highestPopulationYear(Array $births, Array $deaths): Int {

    sort($births);
    sort($deaths);

    $nextBirthYear = reset($births);
    $nextDeathYear = reset($deaths);

    $years = [];
    if ($nextBirthYear) {
        $years[] = $nextBirthYear;
    }
    if ($nextDeathYear) {
        $years[] = $nextDeathYear;
    }

    if ($years) {
        $currentYear = max(0, ...$years);
    } else {
        $currentYear = 0;
    }

    $maxYear = $maxPopulation = $currentPopulation = 0;

    while(current($births) !== false || current($deaths) !== false || $years) {

        while($currentYear === $nextBirthYear) {
            $currentPopulation++;
            $nextBirthYear = next($births);
        }

        while($currentYear === $nextDeathYear) {
            $currentPopulation--;
            $nextDeathYear = next($deaths);
        }

        if ($currentPopulation >= $maxPopulation) {
            $maxPopulation = $currentPopulation;
            $maxYear = $currentYear;
        }

        $years = [];

        if ($nextBirthYear) {
            $years[] = $nextBirthYear;
        }
        if ($nextDeathYear) {
            $years[] = $nextDeathYear;
        }
        if ($years) {
            $currentYear = min($years);
        } else {
            $currentYear = 0;
        }
    }

    return $maxYear;
}

The algorithm above should work in polynomial time given it is at worst O(((n log n) * 2) + k) where n is number of elements to be sorted from each array and k is number of birth years (since we know that k is always k >= y) where y is number of death years. However, I'm not sure if there is a more efficient solution.

My interests are purely in an improved Big O of computational complexity upon the existing algorithm. Memory complexity is of no concern. Nor is the runtime optimization. At least it's not a primary concern. Any minor/major runtime optimizations are welcome, but not the key factor here.

As you have a working solution, would this be better fitted to https://codereview.stackexchange.com/? — Nigel Ren, Feb 23 '20 at 18:17
The question is seeking the most efficient solution, not necessarily *any* working solution. I think that's perfectly valid on SO. — Sherif, Feb 23 '20 at 18:19
I'm not saying it's not valid on SO (I would have voted to close in that case), I am just wondering if you may get more of a response on CR. — Nigel Ren, Feb 23 '20 at 18:20
@NigelRen I don't see the harm in trying. Though I would like to leave this open for a few days. If it doesn't get an answer I will put a bounty on it. — Sherif, Feb 23 '20 at 18:21
SO itself has a lot of your problem question if you search for birth death keywords. A cheap improvement would be to improve the sort: make an array of length the span of birth/death (each cell is a date holding for value 0 by default). add 1 or substract 1 to the cell regarding birth and death, then cumulatively sum and keep the max sum found — grodzi, Feb 23 '20 at 18:30
@grodzi I searched SO, but did not find an answer to my question. If you have found an answer that can lead me to a more efficient solution, please do share. I also don't see how your suggestion would improve upon the current solution. What would I do to "*improve the sort*" here? The current sorting uses **quicksort** which is, as far as I know, the **fastest possible sorting algorithm**. If you know how to improve upon quicksort, please share. — Sherif, Feb 23 '20 at 18:35
I don't know php, but there is the possibility to get the max population year in `O(k + y)` by using a hasp map. Are you interested in such a simple solution? I guess you already had a look at such possibilities. If yes, I could post some pseudo-code — Damien, Feb 23 '20 at 18:36
You tagged php. I don't post php. I also gave you the sort. Le me insist with a clearer example. Take 24310. You want to sort it as 01234. Then make an array of size 5, and assign digit i to v[i]. 2 is assigned to v[2], 4 to v[4]... kind of like a counting sort. This is O(n). — grodzi, Feb 23 '20 at 18:44
Note: the avaibility of a hash map in php was discussed here for example: https://stackoverflow.com/questions/6841379/is-there-java-hashmap-equivalent-in-php — Damien, Feb 23 '20 at 18:44
The counting sort proposed by @grodzi is interesting indeed and has a complexity `O(max_year - min_year)` — Damien, Feb 23 '20 at 18:49
@grodzi I don't think that counting sort is an improvement upon quicksort in this case as the range of `k` can be huge even if the size of `n` is tiny. In the average case it would be worse than quicksort, not better. — Sherif, Feb 23 '20 at 18:57
@Sherif At least in the dum challenges, the order is about 1e5 which ever the dimension n or M(max_year-min_year) ... "average" is quite subjective. You should define what is the average case because optimization __may__ concern the sort being used (not on comparison but on keys). — grodzi, Feb 25 '20 at 06:49
This is a duplicate question. The problem with finding the duplicates is that the questions have no consistent tags. e.g. [What is an efficient way to get the max concurrency in a list of tuples?](https://stackoverflow.com/q/60127612/1243762) — Guy Coder, Mar 06 '20 at 12:55
I voted to close this but can not because it currently has a bounty. — Guy Coder, Mar 06 '20 at 12:55
@GuyCoder I don't see how that answers my question at all. It's looking for overlapping time periods. I'm not interested in what time periods overlap, but rather the maximum contiguous positive sum of deltas. Those are two completely different problems. — Sherif, Mar 07 '20 at 04:15

Dave · Answer 1 · 2020-03-03T02:16:11.830

4

We can solve this in linear time with bucket sort. Let's say the size of the input is n, and the range of years is m.

O(n): Find the min and max year across births and deaths.
O(m): Create an array of size max_yr - min_yr + 1, ints initialized to zero. 
      Treat the first cell of the array as min_yr, the next as min_yr+1, etc...
O(n): Parse the births array, incrementing the appropriate index of the array. 
      arr[birth_yr - min_yr] += 1
O(n): Ditto for deaths, decrementing the appropriate index of the array.
      arr[death_yr - min_yr] -= 1
O(m): Parse your array, keeping track of the cumulative sum and its max value.

The largest cumulative maximum is your answer.

The running time is O(n+m), and the additional space needed is O(m).

This is a linear solution in n if m is O(n); i.e., if the range of years isn't growing more quickly than the number of births and deaths. This is almost certainly true for real world data.

edited Mar 03 '20 at 02:16

answered Mar 02 '20 at 16:38

Dave

7,460
3
26
39

1

@Sherif Implementation is left as an exercise for the reader... It's trivial anyway. Is anything not clear? – Dave Mar 02 '20 at 17:52
I'll note that because your granularity is year, there is some ambiguity. in that we're effectively measuring population as of the year-end, and there may be some other point of time mid-year where the population is higher due to the timing of births and deaths. – Dave Mar 02 '20 at 18:05
There is no ambiguity there. As clearly stated in the question you can safely assume that the year on which someone is born or dead that it counts as of that year. – Sherif Mar 02 '20 at 22:46
1

How is this linear time if we have to parse an "array of size max_yr - min_yr + 1" ? (cc @Sherif) – גלעד ברקן Mar 02 '20 at 23:09
That's a good question. Please try to include the Big O break down of your algorithm in the answer as this is the primary focus of the question. – Sherif Mar 02 '20 at 23:35
@גלעדברקן I stated my assumption about the range of years being small relative to the number of births and deaths. Even if it were equal, with an average of one birth or death every year, it would be linear. We'd need a very sparse set of years for this to be sublinear. – Dave Mar 03 '20 at 01:53
@Sherif It's ambiguous only in the sense that the year you pick may never have had the max population, depending on the order of deaths and births within that year. It will have had the max end-of-year population, which is what you're asking for. – Dave Mar 03 '20 at 02:02
@Dave Right, that is all we're after here. The only requirement is the max end of year total population. – Sherif Mar 03 '20 at 02:27
I already stated this idea in my answer on February 25, "If the year range, m, is on the order of n, we could store the counts for each year in the range and have O(n) time complexity." (cc @Sherif) – גלעד ברקן Mar 03 '20 at 02:45
@גלעדברקן Correct, you did. – Sherif Mar 03 '20 at 03:05
1

@Dave: is the complexity not O(2n) for points 1 and 2? **1.** iterate once through all births+death: `O(n): Find the min and max year across births and deaths` **2.** iterate again through all births+death: `O(n): Parse the births+death array, incrementing the appropriate index of the array` then you do: O(m): Parse your array, keeping track of the cumulative sum and its max value. *(you don't need to parse this array - you can keep track of MAX while incrementing the indices in 2)* – Antony Mar 05 '20 at 06:36
@Antony Yes, but O(2n) = O(n). One writes f ( x ) = O ( g ( x ) ) as x → ∞ if and only if for all sufficiently large values of x, the absolute value of f(x) is at most a positive constant multiple of g(x). – Dave Mar 06 '20 at 11:37
Your last statement about *O(2n) = O(n)* as *x → ∞*, I would have thought that the numbers we are dealing with here are much more limited than that and as such the difference between 2n and n is significant. – Nigel Ren Mar 07 '20 at 08:43
@NigelRen Constant multiples are ignored in O notation. In practice these can be significant, but regardless O(2n) = O(n). – Dave Mar 08 '20 at 22:07
So is the notation wrong for scanning the two lists twice then, should it be something like O(n+n). It just looks odd that doing the same operation twice is the same as doing it once. – Nigel Ren Mar 09 '20 at 07:26

גלעד ברקן · Accepted Answer · 2020-02-25T12:31:25.197

3

I think we can have O(n log n) time with O(1) additional space by first sorting, then maintaining a current population and global maximum as we iterate. I tried to use the current year as a reference point but the logic still seemed a bit tricky so I'm not sure it's completely worked out. Hopefully, it can give an idea of the approach.

JavaScript code (counterexamples/bugs welcome)

function f(births, deaths){
  births.sort((a, b) => a - b);
  deaths.sort((a, b) => a - b);

  console.log(JSON.stringify(births));
  console.log(JSON.stringify(deaths));
  
  let i = 0;
  let j = 0;
  let year = births[i];
  let curr = 0;
  let max = curr;

  while (deaths[j] < births[0])
    j++;

  while (i < births.length || j < deaths.length){
    while (year == births[i]){
      curr = curr + 1;
      i = i + 1;
    }
    
    if (j == deaths.length || year < deaths[j]){
      max = Math.max(max, curr);
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    
    } else if (j < deaths.length && deaths[j] == year){
      while (deaths[j] == year){
        curr = curr - 1;
        j = j + 1;
      }
      max = Math.max(max, curr);
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    }

    if (j < deaths.length && deaths[j] > year && (i == births.length || deaths[j] < births[i])){
      year = deaths[j];
      while (deaths[j] == year){
        curr = curr - 1;
        j = j + 1;
      }
      console.log(`year: ${ year }, max: ${ max }, curr: ${ curr }`);
    }

    year = births[i];
  }
  
  return max;
}

var input = [
  [[1997, 1997, 1997, 1998, 1999],
  [1998, 1999]],
  [[1, 2, 2, 3, 4],
  [1, 2, 2, 5]],
  [[1984, 1981, 1984, 1991, 1996],
  [1991, 1984, 1997]],
  [[1984, 1981, 1984, 1991, 1996],
  [1991, 1982, 1984, 1997]]
]

for (let [births, deaths] of input)
  console.log(f(births, deaths));

If the year range, m, is on the order of n, we could store the counts for each year in the range and have O(n) time complexity. If we wanted to get fancy, we could also have O(n * log log m) time complexity, by using a Y-fast trie that allows successor lookup in O(log log m) time.

edited Feb 25 '20 at 12:31

answered Feb 24 '20 at 04:01

גלעד ברקן

23,602
3
25
61

1. thx for teaching me existence of Y-fast trie. Regarding algo: no need to check the max after decreasing. Only after incrementing. Last while block is unnecesary: consider sorting two sorted list: you just need the head of both (i,j), pick the head of each, and advance the smaller one. ```if(birth_i < death_j){//increment stuff + check max} else{//decrement}; birth_i||=infty; death_j||=infty```. Also you can iterate up to ```min(birthSize, deathSize)```. if min is birth, stop. if min is death (suspicious..), stop and check ```(max + birth.length-i)``` – grodzi Feb 25 '20 at 07:12
@grodzi I did start out considering merge sort but concluded this needs extra handling because of how duplicates as well as the order of birth vs death affects the count. The last while loop seems necessary to me when there are death years unmatched by birth years. You are correct that the max in that loop is unnecessary. – גלעד ברקן Feb 25 '20 at 12:21
@גלעדברקן Use bucket sort for linear time. – Dave Mar 02 '20 at 18:07
I already stated this idea in my answer, "If the year range, m, is on the order of n, we could store the counts for each year in the range and have O(n) time complexity." – גלעד ברקן Mar 03 '20 at 02:50
this is not efficiency, I don't know why give you the reward hahaha – Emiliano Mar 08 '20 at 22:58
@Emiliano my answer contains a description of three ways to answer the question, only one of which the answer provides code for. Which method do you consider most efficient and is there an answer on this page describing it? – גלעד ברקן Mar 09 '20 at 02:27
@Emiliano First, it's my bounty and I'm free to award it to whomever I please. Second, I chose the answer I felt had the most well-rounded response to all of my requirements. Third, I did not see any improvement upon the answer awarded in any of the other answers (they were either the same idea or a haphazard implementation with no address to the over-arching algorithm question). Any answer that was similar but posted after the selected answer I commend, but it would seem unfair to award them the bounty since this one was first. – Sherif Mar 09 '20 at 12:58
@Emiliano Finally, I down voted your answer specifically because it does not work. I even outlined in great detail in the comments on your answer where it is broken. It fails more than half of the unit tests. An answer that doesn't even work can't possibly be efficient plus you failed to address the algorithm question and the Big O was miscalculated. I've down voted answers for less. – Sherif Mar 09 '20 at 13:00

score 3 · Answer 3 · answered Mar 03 '20 at 07:50

First aggregate the births and deaths into a map (year => population change), sort that by key, and calculate the running population over that.

This should be approximately O(2n + n log n), where n is the number of births.

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];

function highestPopulationYear(array $births, array $deaths): ?int
{
    $indexed = [];

    foreach ($births as $birth) {
        $indexed[$birth] = ($indexed[$birth] ?? 0) + 1;
    }

    foreach ($deaths as $death) {
        $indexed[$death] = ($indexed[$death] ?? 0) - 1;
    }

    ksort($indexed);

    $maxYear = null;
    $max = $current = 0;

    foreach ($indexed as $year => $change) {
        $current += $change;
        if ($current >= $max) {
            $max = $current;
            $maxYear = $year;
        }
    }

    return $maxYear;
}

var_dump(highestPopulationYear($births, $deaths));

As I see: With *n* = number of events (births + deaths) and *m* = number of event years (years with births or deaths) this would be actually *O(n + m log m)*. If *n >> m* - this can be considered as *O(n)*. If you have billions of births and deaths in a period of (say) 100 years - sorting an array with 100 elements (`ksort($indexed)`) becomes irrelevant. — Paul Spiegel, Mar 03 '20 at 18:55
You could process the births with `$indexed = array_count_values($births);`. — Nigel Ren, Mar 06 '20 at 17:09

User_67128 · Answer 4 · 2020-03-03T19:06:04.630

I solved this problem with a memory requirement of O(n+m) [in worst case, best case O(n)]

and, time complexity of O(n logn).

Here, n & m are the length of births and deaths arrays.

I don't know PHP or javascript. I've implemented it with Java and the logic is very simple. But I believe my idea can be implemented in those languages as well.

Technique Details:

I used java TreeMap structure to store births and deaths records.

TreeMap inserts data sorted (key based) as (key, value) pair, here key is the year and value is the cumulative sum of births & deaths (negative for deaths).

We don't need to insert deaths value that happened after the highest birth year.

Once the TreeMap is populated with the births & deaths records, all the cumulative sums are updated and store the maximum population with year as it progressed.

Sample input & output: 1

Births: [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906]

Deaths: [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915]

Year counts Births: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1911=2, 1914=1, 1919=2}

Year counts Birth-Deaths combined: {1900=2, 1901=1, 1903=1, 1904=1, 1906=1, 1908=3, 1909=1, 1910=-1, 1911=0, 1912=-1, 1913=-1, 1914=-2, 1915=-2, 1919=2}

Yearwise population: {1900=2, 1901=3, 1903=4, 1904=5, 1906=6, 1908=9, 1909=10, 1910=9, 1911=9, 1912=8, 1913=7, 1914=5, 1915=3, 1919=5}

maxPopulation: 10
yearOfMaxPopulation: 1909

Sample input & output: 2

Births: [1906, 1901, 1911, 1902, 1905, 1911, 1902, 1905, 1910, 1912, 1900, 1900, 1904, 1913, 1904]

Deaths: [1917, 1908, 1918, 1915, 1907, 1907, 1917, 1917, 1912, 1913, 1905, 1914]

Year counts Births: {1900=2, 1901=1, 1902=2, 1904=2, 1905=2, 1906=1, 1910=1, 1911=2, 1912=1, 1913=1}

Year counts Birth-Deaths combined: {1900=2, 1901=1, 1902=2, 1904=2, 1905=1, 1906=1, 1907=-2, 1908=-1, 1910=1, 1911=2, 1912=0, 1913=0}

Yearwise population: {1900=2, 1901=3, 1902=5, 1904=7, 1905=8, 1906=9, 1907=7, 1908=6, 1910=7, 1911=9, 1912=9, 1913=9}

maxPopulation: 9
yearOfMaxPopulation: 1906

Here, deaths occurred (1914 & later) after the last birth year 1913, was not counted at all, that avoids unnecessary computations.

For a total of 10 million data (births & deaths combined) and over 1000 years range, the program took about 3 sec. to finish.

If same size data with 100 years range, it took 1.3 sec.

All the inputs are randomly taken.

kmuenkel · Answer 5 · 2020-03-02T20:26:35.527

$births = [1984, 1981, 1984, 1991, 1996];
$deaths = [1991, 1984];
$years = array_unique(array_merge($births, $deaths));
sort($years);

$increaseByYear = array_count_values($births);
$decreaseByYear = array_count_values($deaths);
$populationByYear = array();

foreach ($years as $year) {
    $increase = $increaseByYear[$year] ?? 0;
    $decrease = $decreaseByYear[$year] ?? 0;
    $previousPopulationTally = end($populationByYear);
    $populationByYear[$year] = $previousPopulationTally + $increase - $decrease;
}

$maxPopulation = max($populationByYear);
$maxPopulationYears = array_keys($populationByYear, $maxPopulation);

$maxPopulationByYear = array_fill_keys($maxPopulationYears, $maxPopulation);
print_r($maxPopulationByYear);

This will account for the possibility of a tied year, as well as if a year of someone's death does not correspond to someone's birth.

This answer makes no attempt to provide the academic Big O explanation that is requested by the OP. — mickmackusa, Mar 07 '20 at 13:31

yergo · Answer 6 · 2020-03-05T11:46:24.100

Memory wise it is to keep currentPopulation and currentYear calculated. Starting by sorting both $births and $deaths arrays is a very good point, because bubble sorting is not that heavy task, yet allows to cut some corners:

<?php

$births = [1997, 1999, 2000];
$deaths = [2000, 2001, 2001];

function highestPopulationYear(array $births, array $deaths): Int {

    // sort takes time, but is neccesary for futher optimizations
    sort($births);
    sort($deaths);

    // first death year is a first year where population might decrase 
    // sorfar max population
    $currentYearComputing = $deaths[0];

    // year before first death has potential of having the biggest population
    $maxY = $currentYearComputing-1;

    // calculating population at the begining of the year of first death, start maxPopulation
    $population = $maxPop = count(array_splice($births, 0, array_search($deaths[0], $births)));

    // instead of every time empty checks: `while(!empty($deaths) || !empty($births))`
    // we can control a target time. It reserves a memory, but this slot is decreased
    // every iteration.
    $iterations = count($deaths) + count($births);

    while($iterations > 0) {
        while(current($births) === $currentYearComputing) {
            $population++;
            $iterations--;
            array_shift($births); // decreasing memory usage
        }

        while(current($deaths) === $currentYearComputing) {
            $population--;
            $iterations--;
            array_shift($deaths); // decreasing memory usage
        }

        if ($population > $maxPop) {
            $maxPop = $population;
            $maxY = $currentYearComputing;
        }

        // In $iterations we have a sum of birth/death events left. Assuming all 
        // are births, if this number added to currentPopulation will never exceed
        // current maxPoint, we can break the loop and save some time at cost of
        // some memory.
        if ($maxPop >= ($population+$iterations)) {
            break;
        }

        $currentYearComputing++;
    }

    return $maxY;
}

echo highestPopulationYear($births, $deaths);

not really keen on diving into Big O thing, left it to you.

Also, if you rediscover currentYearComputing every loop, you can change loops into if statements and leave with just one loop.

    while($iterations > 0) {

        $changed = false;

        if(current($births) === $currentYearComputing) {
            // ...
            $changed = array_shift($births); // decreasing memory usage
        }

        if(current($deaths) === $currentYearComputing) {
            // ...
            $changed = array_shift($deaths); // decreasing memory usage
        }

        if ($changed === false) {
            $currentYearComputing++;
            continue;
        }

array shift is a good option for the memory but not for performance, check this https://cmljnelson.blog/2018/10/16/phps-array_shift-performance/ — Emiliano, Mar 07 '20 at 03:37
You always can sorting descending, go with decrementation instead with incrementation, and with pop instead of shift. — yergo, Mar 07 '20 at 20:21

Emiliano · Answer 7 · 2020-03-07T23:52:21.977

0

I fill very comfortable of this solution, the complexity Big O is n + m

<?php
function getHighestPopulation($births, $deaths){
    $max = [];
    $currentMax = 0;
    $tmpArray = [];

    foreach($deaths as $key => $death){
        if(!isset($tmpArray[$death])){
            $tmpArray[$death] = 0;    
        }
        $tmpArray[$death]--;
    }
    foreach($births as $k => $birth){
        if(!isset($tmpArray[$birth])){
            $tmpArray[$birth] = 0;
        }
        $tmpArray[$birth]++;
        if($tmpArray[$birth] > $currentMax){
            $max = [$birth];
            $currentMax = $tmpArray[$birth];
        } else if ($tmpArray[$birth] == $currentMax) {
            $max[] = $birth;
        }
    }

    return [$currentMax, $max];
}

$births = [1997, 1997, 1997, 1998, 1999];
$deaths = [1998, 1999];

print_r (getHighestPopulation($births, $deaths));
?>

edited Mar 07 '20 at 23:52

answered Mar 07 '20 at 03:32

Emiliano

698
9
30

Shouldn't `$tmpArray--` be `$tmpArray[$death]--`? Also please test with `$births=[1997,1997,1998]; $deaths=[];` - Does it return `1998` as it should? – Paul Spiegel Mar 07 '20 at 12:06
This code not only fails in the complex edge cases, but it even fails in the simplest of cases like given the input arrays `$births = [3,1,2,1,3,3,2]` and `$deaths = [2,3,2,3,3,3]` I would expect to get back `2` as the highest population year, yet your code returns `1`. In fact ***your code failed 9 out of 15 of my unit tests***. I not only can't accept this as **the most** efficient answer, but I can't even accept it *an* efficient answer since it doesn't work at all. – Sherif Mar 08 '20 at 23:32
You failed to read the question carefully and thus failed to provide a good answer. You make the assumption here that I told you not to make (*that the arrays are sorted*). So please remove your offensive comment in the question about how I awarded the bounty to a non-efficient answer and this is somehow a "*fix*". – Sherif Mar 08 '20 at 23:33

Ronak Dhoot · Answer 8 · 2020-03-08T06:47:11.340

One of most simple and clear approach for your problem.

$births = [1909, 1919, 1904, 1911, 1908, 1908, 1903, 1901, 1914, 1911, 1900, 1919, 1900, 1908, 1906];
$deaths = [1910, 1911, 1912, 1911, 1914, 1914, 1913, 1915, 1914, 1915];

/* for generating 1 million records

for($i=1;$i<=1000000;$i++) {
    $births[] = rand(1900, 2020);
    $deaths[] = rand(1900, 2020);
}
*/

function highestPopulationYear(Array $births, Array $deaths): Int {
    $start_time = microtime(true); 
    $population = array_count_values($births);
    $deaths = array_count_values($deaths);

    foreach ($deaths as $year => $death) {
        $population[$year] = ($population[$year] ?? 0) - $death;
    }
    ksort($population, SORT_NUMERIC);
    $cumulativeSum = $maxPopulation = $maxYear = 0;
    foreach ($population as $year => &$number) {
        $cumulativeSum += $number;
        if($maxPopulation < $cumulativeSum) {
            $maxPopulation = $cumulativeSum;
            $maxYear = $year;
        }
    }
    print " Execution time of function = ".((microtime(true) - $start_time)*1000)." milliseconds"; 
    return $maxYear;
}

print highestPopulationYear($births, $deaths);

output:

complexity:

O(m + log(n))

for 1 million records execution time is just `29.64 milliseconds` — Ronak Dhoot, Mar 07 '20 at 22:43
As stated in the question I'm not after runtime optimizations, but it should be noted your Big O calculation is slightly off here. Also, your code is slightly broken. It fails in a number of edge cases. — Sherif, Mar 08 '20 at 23:40

Find the year with the highest population (most efficient solution)

Assumptions

Requirements

8 Answers8