2

So I made this function to find what the max value of "column" is in my 2d-list, which returns the max value of my said column in my list, but this said max value actually appears twice, so how do I return the indexes/rows where this max value appears?

Cheers

def find_max(column):
    maxVal = 0
    for i in range(1, len(lst)):
        maxVal = max(maxVal, int(lst[i][column]))
    return (maxVal) 

I feel so lost, but i've been trying something like this.... v (obv not working atm, just brainstorming)


def test(column):
    maxVal = 0
    year = []
    for i in range(1, len(lst)):
        if maxVal == int(lst[i][column]):
            year.append(lst[i][0])
        else:
            maxVal = max(maxVal, int(lst[i][column]))
            year = (lst[i][0])
        year.extend(maxVal)
    return year

#so column 0 is years, and I want to save the years where my X column had the biggest value(s). 

Edit: My list looks like this

And lets say the column I'm looking for is the third, so I have the max value of 27 on row 36 & 38, how do I return these indexes? (What im actually looking for is what the value on first column is, 2004 & 2006)

Fribbe
  • 47
  • 6
  • Do you have any expected inputs or test data? – Tsubasa Nov 21 '20 at 05:39
  • added example @Ava – Fribbe Nov 21 '20 at 05:44
  • Oh I was not talking about the raw data, but the one that you have already processed. Anyway let me clarify. Is your processed data something like this? `[ [2001, 86, 26, 163, 9, 8, 214, 8], [2002, 91, 26, 174, 9, 9, 201, 8], ... ]` – Tsubasa Nov 21 '20 at 05:50
  • 1
    Yes, but with apostrophes: `[['2001', '86', '26', '163', '9', '8', '214', '8', '79', '0', '', '-7'], ['2002', '91', '26', '174', '9', '9', '201', '8', '66', '1', '', '5'], ... ]` Also - first row consists of titles explaining numbers in each column, if this changes anything – Fribbe Nov 21 '20 at 05:53
  • that doesn't matter. Can your 3rd column be negative values? – Tsubasa Nov 21 '20 at 05:56
  • There's fortunely no negative numbers in my data =) – Fribbe Nov 21 '20 at 05:57
  • this may answer your question https://stackoverflow.com/questions/6294179/how-to-find-all-occurrences-of-an-element-in-a-list – Shadowcoder Nov 21 '20 at 06:00

3 Answers3

1

Try this one.

data = [
    ['foo', 'bar', 'foo', 'bruh', 'test', 'foo', 'bar', 'bar'],
    [2001, 86, 26, 163, 9, 8, 214, 8],
    [2002, 91, 27, 174, 9, 9, 201, 8]
]

def get_max(data):
    """
        Arg     :   `data` -> Type: List
        Returns : `result` -> Type: List
    """

    max = 0
    result = []
    
    for l in data[1:]:
        if int(l[2]) > max: 
            max = int(l[2])
            result.clear()
            # result.append(l)
            # if you need the index of where the l appears
            result.append(data.index(l) + 1)

            
        elif int(l[2]) == max:
            # result.append(l)
            result.append(data.index(l) + 1)
            
    return result
    
print(get_max(data))
Tsubasa
  • 1,389
  • 11
  • 21
  • @Fribbe this now returns the list of indexes of where the maximum value appeared in your data – Tsubasa Nov 21 '20 at 06:16
  • TypeError: '>' not supported between instances of 'str' and 'int' https://i.imgur.com/A6mE8Rz.png – Fribbe Nov 21 '20 at 06:17
  • @Fribbe just added the (int) type casting in line 17 also, so now it should work. – Tsubasa Nov 21 '20 at 06:27
  • Seems like it doesn't like my titles being strings? ´ValueError: invalid literal for int() with base 10: 'Kol och koks'` – Fribbe Nov 21 '20 at 06:31
  • Oh also, try with removing the first row. Because that's where your titles are. – Tsubasa Nov 21 '20 at 06:34
  • @Fribbe after removing, you can do `data.index(l) + 1` – Tsubasa Nov 21 '20 at 06:36
  • if i try to iterate through the list with skipping the first row: `for l in range(1, len(lst)): ` I get this error: "TypeError: 'int' object is not subscriptable" – Fribbe Nov 21 '20 at 06:36
  • The first list/row is made up of titles: `[['', 'Biobränslen', 'Kol och koks', 'Råolja och petroleumprodukter', 'Natur- och stadsgas', 'Övriga bränslen', 'Kärnbrnnsle', 'Primär värme', 'Vattenkraft', 'Vindkraft', 'Solkraft', 'Elimport minus elexport'], ['1970', '43', '18', '336', '0', '0', '0', '0', '41', '0', '', '4'], ['1971', '40', '17' .. ` – Fribbe Nov 21 '20 at 06:37
  • According to my code, if you want to exclude the first row, then it will be `for l in data[1:]: ...` @Fribbe – Tsubasa Nov 21 '20 at 06:39
  • Edit; yes it works now, im just really stupid. Thanks alot for bearing with me, you're a lifesaver and I really need to sleep lol :D (its 07:52AM here and havent slept yet) Just started trying to learn python :p – Fribbe Nov 21 '20 at 06:50
  • @Fribbe if my answer helped you then you can upvote and accept it. :) – Tsubasa Nov 21 '20 at 06:58
  • 1
    Thanks for teaching my Python AND stackoverflow lol :D – Fribbe Nov 21 '20 at 07:07
0
import numpy as np

b = np.array([
        [1, 2, 0],
        [1, 3, 9]
    ])

index = np.unravel_index(b.argmax(), b.shape)   # max num index

print(index)

(1, 2) # start by zero, 1 row 2 column

xiaoqiao
  • 9
  • 2
0

To return the column index, the max value, and the years, I have returned a tuple for the final output. See the output printout at the bottom.

I have created sample data and then have created a tuple for the output. The tuple can be modified to a different type of output very easily. Note, that the output columns skip the first column of the array as that is the year, and no max year is needed. Also, the penultimate column has blank data, so extra logic was added to handle blanks. The code should handle blanks in any column, even though they usually only occur in one. The data_colunns_less_2 value can be modified to increase the number of columns.

As with most engineering problems, the first step is to state the problem clearly. By clearly stating the problem, it sometimes becomes trivial to solve:

Given an array containing rows and columns stored as a list of rows where each row contains an array of strings where the first column is a year and the remaining columns are data, and Some columns contain blanks

return an output list of tuples corresponding to each of the data columns.
So, if the original array has n columns, the output list will have n-1 columns since the year column is not needed.

Further, tuples shall consist of a column index -- to the original array, the max value for the column, and a list of the years containing the max value.

import random  # to create test list

def generate_data():
    # create sample list
    # random.seed(365)
    # l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]

    data = []
    data_colunns_less_2 = 8
    for year in range(2000, 2006):
        row = [str(random.randint(0,10)) for _ in range(data_colunns_less_2)]
        row.insert(0, str(year))
        row.append('')
        row.append(str(random.randint(-10,10)))
        data.append(row)
    return data

def print_data(data):
    for row in data: print(row)

def check_int(s): # from https://stackoverflow.com/a/1265696/4983398
    # I like to avoid exceptions
    if len(s) > 0 and s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

def get_max_tuple_for_column(data, col_index):
    max = data[0][col_index]
    if check_int(max):
        max_is_non_digit = False
        max = int(max)
    else:
        max_is_non_digit = True

    indices_of_max = []
    years = []
    for row in data:
        test_val = row[col_index]
        if not check_int(test_val):
            if max_is_non_digit:
                years.append(row[0])
            continue
        else:
            val = int(test_val)

        if max_is_non_digit:
            max = val
            max_is_non_digit = False
            years = [int(row[0])]
        elif val > max:
            max = val
            years = [int(row[0])]  # (re)start a list of years with this max value
        elif val == max:
            years.append(int(row[0])) # save an extra year for this column
    return (col_index, max, years)

if __name__ == '__main__':

    data = generate_data()
    out_list = [get_max_tuple_for_column(data, col) for col in range(1,len(data[0]))]
    print("Generated Random Dataset:")
    for row in data: print(row)
    print("Output: (col_index, max_value ")
    for row in out_list: print(row)

Output:

Generated Random Dataset:
['2000', '1', '2', '2', '9', '1', '9', '4', '8', '', '-9']
['2001', '9', '2', '9', '10', '6', '3', '10', '2', '', '0']
['2002', '4', '2', '2', '1', '4', '2', '9', '7', '', '-1']
['2003', '8', '4', '0', '9', '4', '10', '6', '4', '', '10']
['2004', '7', '10', '6', '5', '2', '1', '6', '1', '', '3']
['2005', '1', '4', '5', '8', '1', '2', '5', '2', '', '5']
Output:(col_index, max_value for column, [year1, year2, ..]
(1, 9, [2001])
(2, 10, [2004])
(3, 9, [2001])
(4, 10, [2001])
(5, 6, [2001])
(6, 10, [2003])
(7, 10, [2001])
(8, 8, [2000])
(9, '', ['2000', '2001', '2002', '2003', '2004', '2005'])
(10, 10, [2003])
Gardener
  • 2,591
  • 1
  • 13
  • 22