1

I am trying to get the standard deviation of a lists of lists but not on all 'columns' of the list since some are numbers (middle columns). So I would skip those.

param_data = [["a", 2, 3, 6, 7, "b"],
              ["c", 6, 7, 8, 2, "d"],
              ["e", 5, 6, 8, 1, "f"]]

Expected results is:

params = [std.dev(2, 6, 5),
          std.dev(3, 7, 6),
          std.dev(6, 8, 8),
          std.dev(7, 2, 1)]

Note: not evaluating the standard deviation because it is not relevant to the question, just expressed that would be evaluated. I tried using zip(*param_data) but cannot figure out how to only zip columns 1-4.

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
chepox
  • 45
  • 6
  • Do you have data that specifies which columns to use? If so, what is it? If not, how shall they be detected? – Kelly Bundy Feb 01 '23 at 01:51
  • 1
    As with most problems in programming, this becomes easier to solve if you break it up into smaller parts. You have a list of lists where each inner list is a row (row-major). You want to [traverse the "2d list" in column-major order](/q/70209294/843953), [check that they are all integers](/q/8964191/843953), and [find the standard deviation](/q/15389768/843953) – Pranav Hosangadi Feb 01 '23 at 02:43

3 Answers3

2

try it:

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],
              ["c", 6, 7, 8, 2, "d"],
              ["e", 5, 6, 8, 1, "f"]]

params = []
for column in zip(*param_data):
    try:
        params.append(stdev(column))
    except TypeError:
        pass

print(params)
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
1

Use a list comprehension to get the corresponding elements, and then use zip.

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

elements = [x[1:5] for x in param_data]
print([stdev(x) for x in zip(*elements)])

This assumes that the numbers are always in that 1:5 slice; if that's not the case, we'll need more information.

Rahul
  • 1,056
  • 2
  • 9
  • 26
  • 1
    We already have the information about stripping off first and last, just use negative indexing, i.e. `x[1:-1]`, see [thread](https://stackoverflow.com/questions/60497971/how-can-i-slice-out-the-last-and-first-characters-of-a-string-in-python). – metatoaster Feb 01 '23 at 01:48
  • @metatoaster I don't see that stated. I only see "middle columns" and "columns 1-4". – Kelly Bundy Feb 01 '23 at 01:54
  • 1
    @KellyBundy eh, "middle columns" with the context implied that. Though I can see why others might think I read too much into those two words. Nonetheless providing both did make the requirement ambiguous, doesn't hurt to provide both, OP did clarify the specific limitation of their solution. – metatoaster Feb 01 '23 at 02:10
  • Sorry for the ambiguity. Columns are fixed for my application. So the answer from metatoaster is spot on. – chepox Feb 01 '23 at 02:53
  • Application has fixed column count and positions. So metatoaster's idea works for my case. – chepox Feb 01 '23 at 02:55
1

Here is a solution with better error handling:

from statistics import stdev
from numbers import Number

def resilient_stdevs(columns):
    cols = list(zip(*columns))
    for c in cols:
        if isinstance(c[0], Number):
            if not all(isinstance(x, Number) for x in c):
                raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
        if not isinstance(c[0], Number):
            if not all(not isinstance(x, Number) for x in c):
                raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
    return [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]

param_data = [["a", 2, "s", 6, 7, "b"],["c", 6, 4, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

print(resilient_stdevs(param_data))

This crashes (as it should) with a clear error message:

line 12, in resilient_stdevs
    raise TypeError("Cannot compute stdev of mixed types in a single column: got " + str(c))
TypeError: Cannot compute stdev of mixed types in a single column: got ('s', 4, 6)

You can use list(zip(*param_data)) to transpose and isinstance to check the types, this works wherever the string columns are, even in the middle or if you have more of them:

from statistics import stdev

param_data = [["a", 2, 3, 6, 7, "b"],["c", 6, 7, 8, 2, "d"], ["e", 5, 6, 8, 1, "f"]]

cols = list(zip(*param_data))

params = [stdev(xs) for xs in cols if isinstance(xs[0], (int, float))]

print(params)

Output:

[2.0816659994661326, 2.0816659994661326, 1.1547005383792515, 3.2145502536643185]
Caridorc
  • 6,222
  • 2
  • 31
  • 46