1

I have two lists containing float values:

mean_fall_1 = [statistics.mean(d) for d in fall_1_gpa]
stdev_fall_1 = [statistics.stdev(d) for d in fall_1_gpa]

where:

fall_1_gpa = [[mean(sub_list) for sub_list in list] for list in fall1_grades]

Furthermore, I have a list of strings:

combination_fall_1 = [['CS105','MATH101','ENG101','GER'],['CS105','MATH101','GER','GER']]
fall1_grades = [[[4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0]],[[4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0], [4.0, 3.33, 3.33, 4.0]]]
mean_fall_1 = [2.9687393162393163,3.419960107803423]
stdev_fall_1 = [0.33945301919611576,0.2821718924791329]

What I am trying to do is to find the best combination of mean_fall_1 and stdev_fall_1 and list them(show first the highest mean possible with the lowest stdev possible and rank them like this). What I do is:

mean_fall_1, stdev_fall_1 = sorted(
        list(zip(*(zip(mean_fall_1, stdev_fall_1)))))
    mean_fall_1, stdev_fall_1 = (list(t) for t in sorted(list(zip(*(zip(mean_fall_1, stdev_fall_1))))))

and when I print(stdev and then mean) I get this result:

[0.2821718924791329, 0.33945301919611576]
[3.419960107803423, 2.9687393162393163]

but I want the combination_fall_1 list to be sorted accordingly with this in order for me to be able to show the user the combination of courses and not the mean and stdev only. I tried doing this:

mean_fall_1, stdev_fall_1,combination_fall_1  = sorted(
            list(zip(*(zip(mean_fall_1, stdev_fall_1,combination_fall_1 )))))
        mean_fall_1, stdev_fall_1 = (list(t) for t in sorted(list(zip(*(zip(mean_fall_1, stdev_fall_1,combination_fall_1 ))))))

But I keep getting this error:

TypeError: '<' not supported between instances of 'list' and 'float'

Is there another way to sort the combination_fall_1 list according to the other 2? or am I missing something?

The desired output:

[['CS105','MATH101','GER','GER'],['CS105','MATH101','ENG101','GER']]

Since the mean of ['CS105','MATH101','GER','GER'] is 3.419960107803423 and its st.dev 0.2821718924791329 which is better combination of ['CS105','MATH101','ENG101','GER'] with mean 2.9687393162393163 and st.dev 0.33945301919611576

piggy
  • 115
  • 10
  • Your question is very unclear; since we don't have `fall1_grades`, `mean_fall_1` and `stdev_fall_1` are meaningless. Also, what does "find the best combination" really mean? I would recommend you edit your question to include **simple** samples of these two, just like `combination_fall_1` and show your expected output. – Jack Fleeting May 17 '20 at 15:53
  • @JackFleeting thank you for your comment. I provided a sample of the list now. best combination is to rank the mean and stdev list based on if mean being the maximum possible and stdev the lowest possible. So if there were 2 means with same value, then I want 1st to be shown the one with the lowest respective stdev – piggy May 17 '20 at 16:12
  • Almost there. Now please edit the question to show **exactly** the expected output from `print(mean_fall_1, stdev_fall_1,combination_fall_1)`. – Jack Fleeting May 17 '20 at 16:25
  • @JackFleeting I just now edited it to show the desired output from the combination_fall_1. Thanks for your help hope now it is okay – piggy May 17 '20 at 16:31
  • Yes, now the question is clear! But there's a logical question - you say you are looking for the "highest mean possible with the lowest stdev possible"; is it possible that one of these (two, in this example) combinations will have the " highest mean possible" but NOT have "the lowest stdev possible"? Or do they always go together? – Jack Fleeting May 17 '20 at 16:39
  • `mean` is not defined. Is that `statistics.mean`? – wjandrea May 17 '20 at 16:42
  • Yes @wjandrea it is statistics.mean . I have it like this in my code above – piggy May 17 '20 at 16:43
  • @JackFleeting yes there might be times for example that mean is 3.6 and st.dev 1.3 and another mean is 3.3 and st.dev 0.1 so the code should provide first the mean 3.3. The means in general are the average GPAs of the combinations of the courses so I want to provide the user with what is better for them to take based on the outcome of this "best combination" ranking. There might be also times that mean is 3.5 for 2 different combinations and st.dev once is 0.1 and the other 0.6 – piggy May 17 '20 at 16:45
  • @wjandrea from statistics import mean , now I saw for which part you meant! – piggy May 17 '20 at 16:48
  • Is that `fall1_grades` correct? I ran your code but got `mean_fall_1 == [3.665, 3.665]` and `stdev_fall_1 == [0.0, 0.0]`. You should really provide a [mre]. – wjandrea May 17 '20 at 16:51
  • Actually, on second thought, `fall1_grades` isn't even relevant to the problem really, nor how you calculate `mean_fall_1` and `stdev_fall_1`. You just need to provide the values for `mean_fall_1` and `stdev_fall_1`, which I guess you did, but they're in the wrong order??? – wjandrea May 17 '20 at 16:55
  • @wjandrea actually I did provide those and I can provide the full list of fall1_grades but it is huge, but if needed I can – piggy May 17 '20 at 16:57
  • 1
    Another issue: in your edit you say that "mean of ['CS105','MATH101','ENG101','GER'] is 3.419960107803423", but unless I'm missing something, that mean belongs to the other combination, judging by the order of the two lists. – Jack Fleeting May 17 '20 at 16:57
  • @JackFleeting you are completely right, I am fixing it now. I did not see that I did this mistake – piggy May 17 '20 at 16:58
  • @piggy No, don't provide the full data. Just focus on what you're trying to accomplish with the mean and stddev, and get rid of the irrelevant stuff. See [mre] for reference. – wjandrea May 17 '20 at 16:59
  • @wjandrea I provided it because previously they asked for it, at the beginning I did not have it. But thank you for the suggestion!! – piggy May 17 '20 at 17:01
  • 1
    @piggy Here's one way you could simplify your example data: [gist](https://gist.github.com/wjandrea/584f0570ba656b35991d72855c0ca275) – wjandrea May 17 '20 at 17:07

1 Answers1

0

Zip your strings, mean, and stdev together, then the problem boils down to sorting by one field descending (mean) and another ascending (stdev) while ignoring the strings, and after you just need to get the strings back out.

Here's a simplified example:

names = ['a', 'b']
mean = [2.96, 3.41]
stdev = [0.33, 0.28]

groups = list(zip(names, mean, stdev))
groups.sort(key=lambda x: (-x[1], x[2]))
# [('b', 3.41, 0.28), ('a', 2.96, 0.33)]

print([x[0] for x in groups])
# -> ['b', 'a']
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Thank you @wjandrea ! Just a question. So based on your code, if one has mean = 3.4 and st.dev= 1.9 and another one is with mean = 3 and st.dev = 0.0 , will the one with mean = 3 appear first since it is better as combination of both? – piggy May 17 '20 at 17:52
  • @piggy No, this ranks by mean *then* stdev, like you wrote in the question: *"first the highest mean possible with the lowest stdev possible"*. If you want to change that, just flip the key function: `lambda x: (x[2], -x[1])` – wjandrea May 17 '20 at 17:55
  • if I flip it it just means I will first sort by st.dev. and then mean? or like how you show it will do what I commented on your reply? I just got confused! @wjandrea – piggy May 17 '20 at 18:05
  • @piggy Yes, the flipped version will first sort by stdev then by mean. If you want to make sure we're on the same page, post a bigger data set (names, mean, stdev) and what you want, and I'll show you how to do it. – wjandrea May 17 '20 at 18:11
  • names = ['a','b','c','d'] mean = [2.6, 2.6, 3.5, 3.9] stdev = [0.33, 0.28, 0.0, 1.7] if these are the combinations then I want first to be ranked: 3.5, 0.0, 'c' then 3.9,1.7, 'd' then 2.6,0.28,'b' then 2.6,0.33,'a'. (i am not sure if 3.9 with 1.7 is better combination than the two after because probably computations need to be made that I cannot do) @wjandrea does this make sense to you? – piggy May 17 '20 at 18:21
  • @piggy Oh, that's not what I was expecting. What's the algorithm for determining whether a combination is "better"? In any case you would just need to put that algorithm in the key function. – wjandrea May 17 '20 at 18:26
  • It was this one: mean_fall_1, stdev_fall_1 = sorted( list(zip(*(zip(mean_fall_1, stdev_fall_1))))) @wjandrea at least I believe this works as I wish based on my data and results I got – piggy May 17 '20 at 18:28
  • @piggy No, that doesn't make any sense. `zip` is its own inverse, so `zip(*zip(...))` does nothing. The end result is the same as `sorted([mean, stdev])`. – wjandrea May 17 '20 at 18:39
  • So if for example I have names = ['a', 'b'] mean = [3.0, 3.0] stdev = [0.33, 0.28] if I need 'b' to be shown first then the code you uploaded with sorting first mean and then stdev descending order would work? @wjandrea Sorry for all these comment I am just new to it and I want to understand – piggy May 17 '20 at 18:44
  • @piggy Have you tried it? Either of my solutions would work because the means are the same. But ultimately you need to figure out your sorting strategy. – wjandrea May 17 '20 at 18:47
  • 1
    @piggy @wjandrea's answer seems to work in the context of the statements in your question. But you changed things with your comments; so what's more important - high mean or low stdev? For an extreme example, if `a` has mean 10 and stdev 10, and `b` has mean 0 and stdev 0 - which one comes first, and why? – Jack Fleeting May 17 '20 at 21:17
  • @JackFleeting to be quite honest I thought my algorithm for doing it was working but it is not. But for my problem the mean is the average GPA of a course combination so I want to rank them based on the stdev but at the same time taking under consideration the mean. I dont know if how I say it makes sense. But for your example 0 mean 0 stdev cannot happen due to the fact that mean can be max 4.0 and min 2.0 but if that was the case then I believe 10 should be first. But as I said I am still trying to find an algorithm to do so because even me as person cannot categorize them just like that – piggy May 17 '20 at 23:03