1

I am quite new in Python programming. What's an efficient and Pyhtonic way to find the most frequent progressive digit from a list of 4-digits numbers?

Let's say I have the following list: [6111, 7111, 6112, 6121, 6115, 6123].

The logic is to observe that the for the first digit the 6 is the most frequent. I can eliminate the number 7111 for the next considerations.

For the second digit I consider the new candidates [6111, 6112, 6121, 6115, 6123] and I observe that the 1 is the most frequent digit and so on.

At the end of the algorithm I'll have just 1 number of the list left.

If there are 2 or more number with the same occurrences for a digit I can either pick the smaller one on a random one between all of them.

A simple approach could be to convert the list into a Nx4 matrix and consider for each column the most frequent digit. This could work but I find a very stupid and inefficient way to solve this problem. Can anyone help?

EDIT: my code for this solution (NOTE: THIS CODE DOES NOT ALWAYS WORK, SOMETHING IS WRONG. FOR THE SOLUTION TO THIS PROBLEM PLEASE REFER TO @MadPhysicist ANSWER)

import numpy as np
import pandas as pd
from collections import Counter



numbers_list = [6111, 7111, 6112, 6121, 6115, 6123]

my_list = []

for number in numbers_list:
    digit_list = []
    for c in str(number):
       digit_list.append(c)
    my_list.append(digit_list)


matrix = np.array(my_list)

matrix0 = matrix

my_counter = Counter(matrix.T[0]).most_common(1)
i=0
for digit0 in matrix.T[0]:
    if digit0 != my_counter[0][0]:
        matrix0 = np.delete(matrix, i, 0)
    i += 1
matrix = matrix0

matrix1 = matrix
my_counter = Counter(matrix.T[1]).most_common(1)
i=0
for digit1 in matrix.T[1]:
    if digit1 != my_counter[0][0]:
        matrix1 = np.delete(matrix, i, 0)
    i += 1
matrix = matrix1

matrix2 = matrix
my_counter = Counter(matrix.T[2]).most_common(1)
i=0
for digit2 in matrix.T[2]:
    if digit2 != my_counter[0][0]:
        matrix2 = np.delete(matrix, i, 0)
    i += 1

matrix = matrix2

matrix3 = matrix
my_counter = Counter(matrix.T[3]).most_common(1)
i=0
for digit3 in matrix.T[3]:
    if digit3 != my_counter[0][0]:
        matrix3 = np.delete(matrix, i, 0)
    i += 1
matrix = matrix3

print (matrix[0])
G. Guidi
  • 25
  • 4

1 Answers1

3

Your idea of converting to a numpy array is solid. You don't need to split it up-front. A series of masks and histograms will pare down the array fairly quickly.

z = np.array([6111, 7111, 6112, 6121, 6115, 6123])

The nth digits (zero-based) can be obtained with something like

nth = (z // 10**n) % 10

Counting the most frequent one can be accomplished quickly with np.bincount as shown here:

frequentest = np.argmax(np.bincount(nth))

You can select the elements that have that digit in the nth place with simply

mask = nth == frequentest

So now run this in a loop over n (going backwards):

# Input array
z = np.array([6111, 7111, 6112, 6121, 6115, 6123])

# Compute the maximum number of decimal digits in the list.
# You can just manually set this to 4 if you prefer
n = int(np.ceil(np.log10(z + 1).max()))

# Empty output array
output = np.empty(n, dtype=int)

# Loop over the number of digits in reverse.
# In this case, i will be 3, 2, 1, 0.
for i in range(n - 1, -1, -1):

    # Get the ith digit from each element of z
    # The operators //, ** and % are vectorized: they operate
    # on each element of an array to return an array
    ith = (z // 10**i) % 10

    # Count the number of occurrences of each number 0-9 in the ith digit
    # Bincount returns an array of 10 elements. counts[0] is the number of 0s,
    # counts[1] is the number of 1s, ..., counts[9] is the number of 9s
    counts = np.bincount(ith)

    # argmax finds the index of the maximum element: the digit with the
    # highest count
    output[i] = np.argmax(counts)

    # Trim down the array to numbers that have the requested digit in the
    # right place. ith == output[i] is a boolean mask. It is 1 where ith
    # is the most common digit and 0 where it is not. Indexing with such a
    # mask selects the elements at locations that are non-zero.
    z = z[ith == output[i]]

As it happens, np.argmax will return the index of the first maximum count if there are multiple available, meaning that it will always select the smallest number.

You can recover the number from output with something like

>>> output
array([1, 1, 1, 6])
>>> (output * 10**np.arange(output.size)).sum()
6111

You can also just get the remaining element of z:

>>> z[0]
6111
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • I understand your solution but it looks very complex to read and use for me. I think I'm going to try to optimise my code. Anyway I'm happy you found another method to solve it and thank you a lot for your time :)) – G. Guidi May 08 '20 at 20:36
  • @G.Guidi If you tell me what you think is complicated, I'll be happy to explain. In my opinion this is much simpler than yours, but that's probably because I've been thinking in numpy for a long time :) – Mad Physicist May 08 '20 at 20:38
  • I'm under the impression that you don't need to loop. Just do once with `i=1`. – Quang Hoang May 08 '20 at 20:49
  • @QuangHoang. I don't follow. OP specifically talks about building the number with the most common digits. – Mad Physicist May 08 '20 at 20:51
  • Never mind that, your solution is correct. Also you can stop early by checking the max bincount == 1. – Quang Hoang May 08 '20 at 20:57
  • 1
    @QuangHoang. True. I think OP is having enough problems with this solution as it is, so I expanded it with detailed commentary instead of more code :) – Mad Physicist May 08 '20 at 20:58
  • 1
    @G.Guidi. I've added very detailed comments to help you walk through the code and understand how it works. – Mad Physicist May 08 '20 at 21:00
  • @G.Guidi. I appreciate that. Best of luck in your studies! – Mad Physicist May 08 '20 at 21:33
  • @MadPhysicist I reviewed your code and it looks very clean and simple with the comments!! Thank you again, it's a wonderful solution :))) – G. Guidi May 12 '20 at 14:04