1

Given a np.array A and a function f, I would like to create a tuple containing f(A). I simply did:

x = tuple(map(f,A))

Now I would like to improve the overall performance of my program and I think that maybe this part can be done faster. Intuitively, map passes the array once and tuple passes it a second time so perhaps there is a way to do it in one pass? Also, I want x to be a tuple to use it as an array index. So if you know any faster way to index, feel free to share.

I would like to point out that this isn't a duplicate of Most efficient way to map function over numpy array since I want the final result to be a tuple and not a numpy array. So, using the accepted method and then converting to a tuple isn't necessarily the fastest way.

Some time data:

  • x = tuple(map(f,A)) takes ~13 secs
  • x = tuple(np.fromiter((f(x) for x in A), int)) takes ~17 secs
  • vf = np.vectorize(f), x = tuple(vf(A)) takes ~42 secs
leonheess
  • 16,068
  • 14
  • 77
  • 112
cgss
  • 233
  • 2
  • 10
  • "I want x to be a tuple to use it as an array index." - after doing the appropriate Numpy things to transform into another Numpy array, simply convert to `tuple`. – Karl Knechtel Sep 09 '20 at 21:30
  • A well-known quote in programming is "Premature optimization is the root of all evil." Do you have any evidence that this is what's slowing your program down? What you have written is clean, clear, and makes it obvious to the reader exactly what it is you're trying to do. I really wouldn't "fix" it unless you're really sure it's broken. – Frank Yellin Sep 09 '20 at 21:31
  • @Karl I don't get what you are saying. – cgss Sep 09 '20 at 21:39
  • @Frank I was actually thinking about adding this to the actual question: I don't care about readability at all. I would be the only reader. Also I wouldn't say that it's premature since my program is completed. I am not sure that it's the fastest possible though. – cgss Sep 09 '20 at 21:41
  • If you are using Python3, then map(f, A) creates a generator. The code, as you've already written it, won't be creating a new numpy array and won't be creating an intermediate tuple. As you can see from the graph in the previous stackoverflow page mentioned above, the difference between various approaches is essentially noise. The expense of calling "f" swamps everything else. – Frank Yellin Sep 09 '20 at 21:58
  • `map` just sets things up; it doesn't 'run' anything. I prefer the expressiveness of a list comprehension, `tuple([f(y) for y in A])` does the same thing. If you want to run `f` on each 'element' of `A` there's no short cut (unless `f` can work on the whole array). – hpaulj Sep 09 '20 at 21:59
  • @Frank I am indeed using python3. I don't mind switching to python2 if it's faster. `f` is really cheap but it's called lots of times. (~10 million). Also there's the indexing part. – cgss Sep 09 '20 at 22:05
  • @hpaulj What do you mean by the whole array? – cgss Sep 09 '20 at 22:06
  • You haven't told us anything meaningful about `f`. But it sounds like it takes a scalar value, one number, as argument. In that case, it would be faster if `A` was a list rather than a numpy array. – hpaulj Sep 09 '20 at 22:39
  • @hpaulj `f` is a discretizer if I may say. It simply groups values into intervals using ceil. – cgss Sep 09 '20 at 22:54
  • "I don't get what you are saying" See the duplicate that I linked, to see how to use Numpy tools to do what you want with each element and get a Numpy array of the results back. To get a tuple, you can pass that Numpy array directly to the `tuple` constructor. – Karl Knechtel Sep 10 '20 at 02:01
  • @KarlKnechtel What guarantees that this will be faster? It seems to me that having to numpy arrays will make it slower. – cgss Sep 10 '20 at 19:54
  • Using Numpy stuff to operate on the Numpy array is expected to be faster - otherwise there would be no reason to use Numpy for the problem. Numpy stuff operating on Numpy arrays tends to produce more Numpy arrays. Getting a tuple at the end is OP's *requirement*. – Karl Knechtel Sep 10 '20 at 19:56
  • @KarlKnechtel If I understand correctly, in the linked post op's result is a new np.array, not a tuple. I got it wrong? – cgss Sep 10 '20 at 20:03
  • Right; like I *just finished telling you*, using Numpy stuff - like in the post I linked - on Numpy arrays, tends to give you Numpy array results. But **this** OP needs a tuple. Which is why I am telling **this OP** how to convert the result from the duplicate I linked, in order to meet the requirement. – Karl Knechtel Sep 10 '20 at 20:05
  • @KarlKnechtel I am the OP of this question and you are saying this OP and that OP and you are confusing me a bit. No offence. I will try what you are saying but I think it will be slower. I will edit the post when done to add the times of both methods. – cgss Sep 10 '20 at 20:19
  • @KarlKnechtel I have added some time data if you would like to take a look. The linked question didn't help. – cgss Sep 11 '20 at 09:36

0 Answers0