3

I have some data which is either 1 or 2 dimensional. I want to iterate through every pattern in the data set and perform foo() on it. If the data is 1D then add this value to a list, if it's 2D then take the mean of the inner list and append this value. I saw this question, and decided to implement it checking for instance of a list. I can't use numpy for this application.

    outputs = []
    for row in data:
        if isinstance(row, list):
            vals = [foo(window) for window in row]
            outputs.append(sum(vals)/float(len(vals)))
        else:
            outputs.append(foo(row))

Is there a neater way of doing this? On each run, every pattern will have the same dimensionality, so I could make a separate class for 1D/2D but that will add a lot of classes to my code. The datasets can get quite large so a quick solution is preferable.

Community
  • 1
  • 1
Stuart Lacy
  • 1,963
  • 2
  • 18
  • 30
  • I would check for hasattr(row[0], '\__iter\__') instead of isinstance(), but I don't think there's a much quicker way or more elegant way. – knitti Feb 24 '14 at 23:01
  • 1
    Looks pretty neat to me – ForgetfulFellow Feb 24 '14 at 23:01
  • See also http://stackoverflow.com/questions/1952464/in-python-how-do-i-determine-if-an-object-is-iterable. – alecxe Feb 24 '14 at 23:04
  • 1
    Python3.4 (if you are lucky enough to be using it) introduced a [statistics module](http://docs.python.org/3.4/library/statistics.html) – John La Rooy Feb 24 '14 at 23:13
  • @knitti `collections.Iterable` should be used instead, `__iter__` fails for strings. – simonzack Aug 13 '14 at 04:11
  • @simonzack I was under the impression that only real containers were relevant – knitti Aug 13 '14 at 08:20
  • @knitti of course, but I just think thats the intended way, see [this question](http://stackoverflow.com/questions/1952464/in-python-how-do-i-determine-if-an-object-is-iterable) – simonzack Aug 13 '14 at 16:32
  • @simonzack If you look at the code in the question above it is clear to *exclude* strings. Since strings *are* iterable one has to look for a better discriminator - which happens to be `__iter__` – knitti Aug 13 '14 at 20:25
  • @knitti I misunderstood, but when I checked, `__iter__` actually does work for strings in python 3, so perhaps a different discriminator is needed. – simonzack Aug 13 '14 at 21:49
  • @simonzack Hey, I didn't notice that one yet. Thanks a lot, that will save me a lot of headaches after some 2to3 migrations. – knitti Aug 14 '14 at 06:57

1 Answers1

2

Your code is already almost as neat and fast as it can be. The only slight improvement is replacing [foo(window) for window in row] with map(foo, row), which can be seen by the benchmarks:

> python -m timeit "foo = lambda x: x+1; list(map(foo, range(1000)))"
10000 loops, best of 3: 132 usec per loop
> python -m timeit "foo = lambda x: x+1; [foo(a) for a in range(1000)]"
10000 loops, best of 3: 140 usec per loop

isinstance() already seems faster than its counterparts hasattr() and type() ==:

> python -m timeit "[isinstance(i, int) for i in range(1000)]"
10000 loops, best of 3: 117 usec per loop
> python -m timeit "[hasattr(i, '__iter__') for i in range(1000)]"
1000 loops, best of 3: 470 usec per loop
> python -m timeit "[type(i) == int for i in range(1000)]"
10000 loops, best of 3: 130 usec per loop


However, if you count short as neat, you can also simplify your code (after replacingmap) to:

mean = lambda x: sum(x)/float(len(x)) #or `from statistics import mean` in python3.4
output = [foo(r) if isinstance(r, int) else mean(map(foo, r)) for r in data]
dwitvliet
  • 7,242
  • 7
  • 36
  • 62