vectorize
is not a substitute for iteration. It's a way of giving you the full power of numpy
broadcasting. The resulting function takes one or more arrays, broadcasts them together, and then feeds a simple tuple of values (i.e. scalars, one from each array) to the wrapped function.
In your code f
is a list of lines - all the lines from the file.
You could do something like:
N = len(f)
for i in range(0,N,1000):
a = np.loadtxt(f[i:i+1000], delimiter=',')
<process array a>
In other words, feed the lines of f
to loadtxt
in blocks.
Actually you don't need to read all of the lines at once. You could write a generator that reads the file line by line, and returns blocks of lines.
The use of generators to feed loadtxt
(or genfromtxt
) has been discussed before.
Working example of vectorize
In [121]: def processing(astr):
return list(map(int, astr.split(',')))[0] # py3
.....:
In [122]: processing(a[0])
Out[122]: 1
In [123]: fn=np.vectorize(processing, otypes=[int])
In [124]: fn(a)
Out[124]: array([1, 4, 7])
This function takes a string and returns one int
. It's no better than
In [125]: [processing(l) for l in a]
Out[125]: [1, 4, 7]
I removed delimiter
and dtype
from the arguments because we don't want to iterate over those parameters. There is an exclude
parameter to vectorize
; but I didn't want to play with that.
vectorize
also takes multiple values for the otypes
, but I haven't seen an example of that use. Your function didn't work because it returned a sequence (e.g. 3 ints), but vectorize
expected it to return one value (a scalar float or int).
If you specify otypes
as object, your processing
does work - sort of
In [126]: def processing(astr):
return list(map(int, astr.split(','))) # py3
.....:
In [127]: fn=np.vectorize(processing, otypes=[object])
In [128]: fn(a)
Out[128]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=object)
But why not just iterate?
In [129]: [processing(l) for l in a]
Out[129]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [130]: np.array([processing(l) for l in a])
Out[130]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I vaguely recall some SO questions about errors when using vectorize
with object
returns.
While I'm on a roll, I might as well illustrate broadcasting:
Here's your function that takes 2 values - a string and a function, and applies the function to each element of the split:
In [131]: def processing(astr, conv):
return list(map(conv, astr.split(','))) # py3
.....:
In [132]: fn=np.vectorize(processing, otypes=[object])
Now the vectorized
function takes 2 inputs, e.g. a list and function:
In [133]: fn(a,int)
Out[133]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=object)
or a different function for each string in a
(2 lists)
In [134]: fn(a,[int,float,str])
Out[134]: array([[1, 2, 3], [4.0, 5.0, 6.0], ['7', '8', '9']], dtype=object)
Or make the 2nd list a 'column' list - and get back a (2,3) array of lists. One row is ints, the other floats. Obviously I could replace the lists with arrays (0, 1d, 2d etc).
In [136]: fn(a,[[int],[float]])
Out[136]:
array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]], dtype=object)
If you need this kind of flexibility in inputs then use vectorize
. But if you are just iterating over one array or list - do it directly.
I found an example of multiple otypes
: https://stackoverflow.com/a/30255971/901925
Applied to this case:
In [140]: def processing(astr):
return tuple(map(int, astr.split(','))) # py3
.....:
It's important that it returns a tuple, not a list or array.
In [141]: processing(a[0])
Out[141]: (1, 2, 3)
In [142]: fn=np.vectorize(processing, otypes=[int,int,int])
Note that there has to be an otype
for each item of the returned tuple.
In [144]: fn(a)
Out[144]: (array([1, 4, 7]), array([2, 5, 8]), array([3, 6, 9]))
But [1, 4, 7]
is the first value of each of the 3 inputs. It's returning a tuple of arrays, not one array.
In [146]: x,y,z=fn(a)
In [147]: x
Out[147]: array([1, 4, 7])
This behavior bothered the other questioner, and I doubt if it's what you want either. :)
https://stackoverflow.com/a/30088791/901925 - a vectorizing example with time tests.