To return the column index, the max value, and the years, I have returned a tuple for the final output. See the output printout at the bottom.
I have created sample data and then have created a tuple for the output. The tuple can be modified to a different type of output very easily. Note, that the output columns skip the first column of the array as that is the year, and no max year is needed. Also, the penultimate column has blank data, so extra logic was added to handle blanks. The code should handle blanks in any column, even though they usually only occur in one. The data_colunns_less_2
value can be modified to increase the number of columns.
As with most engineering problems, the first step is to state the problem clearly. By clearly stating the problem, it sometimes becomes trivial to solve:
Given an array containing rows and columns stored as a list of rows where each row contains an array of strings where
the first column is a year and the remaining columns are data, and
Some columns contain blanks
return an output list of tuples corresponding to each of the data columns.
So, if the original array has n columns, the output list will have n-1 columns since the year column is not needed.
Further, tuples shall consist of
a column index -- to the original array,
the max value for the column, and
a list of the years containing the max value.
import random # to create test list
def generate_data():
# create sample list
# random.seed(365)
# l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]
data = []
data_colunns_less_2 = 8
for year in range(2000, 2006):
row = [str(random.randint(0,10)) for _ in range(data_colunns_less_2)]
row.insert(0, str(year))
row.append('')
row.append(str(random.randint(-10,10)))
data.append(row)
return data
def print_data(data):
for row in data: print(row)
def check_int(s): # from https://stackoverflow.com/a/1265696/4983398
# I like to avoid exceptions
if len(s) > 0 and s[0] in ('-', '+'):
return s[1:].isdigit()
return s.isdigit()
def get_max_tuple_for_column(data, col_index):
max = data[0][col_index]
if check_int(max):
max_is_non_digit = False
max = int(max)
else:
max_is_non_digit = True
indices_of_max = []
years = []
for row in data:
test_val = row[col_index]
if not check_int(test_val):
if max_is_non_digit:
years.append(row[0])
continue
else:
val = int(test_val)
if max_is_non_digit:
max = val
max_is_non_digit = False
years = [int(row[0])]
elif val > max:
max = val
years = [int(row[0])] # (re)start a list of years with this max value
elif val == max:
years.append(int(row[0])) # save an extra year for this column
return (col_index, max, years)
if __name__ == '__main__':
data = generate_data()
out_list = [get_max_tuple_for_column(data, col) for col in range(1,len(data[0]))]
print("Generated Random Dataset:")
for row in data: print(row)
print("Output: (col_index, max_value ")
for row in out_list: print(row)
Output:
Generated Random Dataset:
['2000', '1', '2', '2', '9', '1', '9', '4', '8', '', '-9']
['2001', '9', '2', '9', '10', '6', '3', '10', '2', '', '0']
['2002', '4', '2', '2', '1', '4', '2', '9', '7', '', '-1']
['2003', '8', '4', '0', '9', '4', '10', '6', '4', '', '10']
['2004', '7', '10', '6', '5', '2', '1', '6', '1', '', '3']
['2005', '1', '4', '5', '8', '1', '2', '5', '2', '', '5']
Output:(col_index, max_value for column, [year1, year2, ..]
(1, 9, [2001])
(2, 10, [2004])
(3, 9, [2001])
(4, 10, [2001])
(5, 6, [2001])
(6, 10, [2003])
(7, 10, [2001])
(8, 8, [2000])
(9, '', ['2000', '2001', '2002', '2003', '2004', '2005'])
(10, 10, [2003])