Obtaining numpy array of shortened lists?

Question

Consider this code:

#!/usr/bin/env python3

import numpy as np

aa = [
  [3, 8, [37, 7, 5, 0, 5, 0, 8, 0]],
  [3, 8, [36, 7, 5, 0, 4, 0, 8, 0]],
  [3, 8, [37, 7, 5, 0, 4, 0, 8, 0]],
  [3, 8, [37, 7, 5, 0, 5, 0, 9, 0]],
  [3, 8, [36, 7, 6, 0, 6, 0, 12, 0]],
  [3, 8, [36, 7, 5, 0, 5, 0, 9, 0]],
  [3, 8, [36, 7, 5, 0, 5, 0, 8, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 10, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 10, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 12, 0]]
]

nch = np.asarray(aa, dtype=object)

print("nch shape {}".format(nch.shape))
print(nch)
nchB = nch[:,2]
print("nchB shape {}".format(nchB.shape))
print(nchB)

print("Test 1")
print( np.frompyfunc(list, 0, 1)(np.empty((3,2), dtype=object)) )
print("Test 2")
print( np.frompyfunc(list, 0, 1)(nchB) )
print("Test 3")
print( np.frompyfunc(list, 1, 1)( nchB ) )

It outputs:

nch shape (10, 3)
[[3 8 list([37, 7, 5, 0, 5, 0, 8, 0])]
 [3 8 list([36, 7, 5, 0, 4, 0, 8, 0])]
 [3 8 list([37, 7, 5, 0, 4, 0, 8, 0])]
 [3 8 list([37, 7, 5, 0, 5, 0, 9, 0])]
 [3 8 list([36, 7, 6, 0, 6, 0, 12, 0])]
 [3 8 list([36, 7, 5, 0, 5, 0, 9, 0])]
 [3 8 list([36, 7, 5, 0, 5, 0, 8, 0])]
 [3 8 list([37, 7, 6, 0, 6, 0, 10, 0])]
 [3 8 list([37, 7, 6, 0, 6, 0, 10, 0])]
 [3 8 list([37, 7, 6, 0, 6, 0, 12, 0])]]
nchB shape (10,)
[list([37, 7, 5, 0, 5, 0, 8, 0]) list([36, 7, 5, 0, 4, 0, 8, 0])
 list([37, 7, 5, 0, 4, 0, 8, 0]) list([37, 7, 5, 0, 5, 0, 9, 0])
 list([36, 7, 6, 0, 6, 0, 12, 0]) list([36, 7, 5, 0, 5, 0, 9, 0])
 list([36, 7, 5, 0, 5, 0, 8, 0]) list([37, 7, 6, 0, 6, 0, 10, 0])
 list([37, 7, 6, 0, 6, 0, 10, 0]) list([37, 7, 6, 0, 6, 0, 12, 0])]
Test 1
[[list([]) list([])]
 [list([]) list([])]
 [list([]) list([])]]
Test 2
[list([]) list([]) list([]) list([]) list([]) list([]) list([]) list([])
 list([]) list([])]
Test 3
[list([]) list([]) list([]) list([]) list([]) list([]) list([]) list([])
 list([]) list([])]

Basically, I use something like nchB to feed a matplotlib boxplot, which works fine.

nchB here is considered to be a single dimension array of length 10, with its elements being lists; it so happens here, each of these lists has 8 elements.

Now, I would want to create an array, which is also a a single dimension array of length 10, with its elements being lists; except I'd want each list to have only one or two elements. So I would want to obtain, say:

[list([37, 7]) list([36, 7])
 list([37, 7]) list([37, 7])
 list([36, 7]) list([36, 7])
 list([36, 7]) list([37, 7])
 list([37, 7]) list([37, 7])]

or:

[list([37]) list([36])
 list([37]) list([37])
 list([36]) list([36])
 list([36]) list([37])
 list([37]) list([37])]

... somehow from nchB, preferably by using a one-liner - then I could use this "reduced" array of lists to feed maxplotlib's boxplot data for initialization (so I can start setting up the plot, and not have to wait a lot of time for my actual data to be rendered).

How can I do this? Obviously, the trivial attempts I made in "Test 2" and "Test 3" above with np.frompyfunc, which I found from:

... don't quite work, as all I get are empty lists.

Looks like my suggestion (in your link) to use `np.frompyfunc(list,0,1)` is misleading. `frompyfunc` is good for creating object dtype arrays. But this form ends up calling `list()` for each element, regardless of the input array because of that `0`. In most cases the `func` needs to be more elaborate, taking one or more inputs - a `lambda` or `def`. — hpaulj, Oct 18 '19 at 14:56

score 0 · Answer 1 · answered Oct 18 '19 at 13:17

Ok, I think I got somewhere - this code now:

#!/usr/bin/env python3

import numpy as np

aa = [
  [3, 8, [37, 7, 5, 0, 5, 0, 8, 0]],
  [3, 8, [36, 7, 5, 0, 4, 0, 8, 0]],
  [3, 8, [37, 7, 5, 0, 4, 0, 8, 0]],
  [3, 8, [37, 7, 5, 0, 5, 0, 9, 0]],
  [3, 8, [36, 7, 6, 0, 6, 0, 12, 0]],
  [3, 8, [36, 7, 5, 0, 5, 0, 9, 0]],
  [3, 8, [36, 7, 5, 0, 5, 0, 8, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 10, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 10, 0]],
  [3, 8, [37, 7, 6, 0, 6, 0, 12, 0]]
]

nch = np.asarray(aa, dtype=object)

print("nch shape {}".format(nch.shape))
print(nch)
nchB = nch[:,2]
print("nchB shape {}".format(nchB.shape))
print(nchB)
#print([i[0] for i in nchB])
#print([ [i[0], i[1]] for i in nchB])
#print(np.asarray([ [i[0], i[1]] for i in nchB], dtype=object))
#print(   np.frompyfunc(list, 1, 1)( np.asarray([ [i[0], i[1]] for i in nchB], dtype=object) )   ) # TypeError: 'int' object is not iterable
#~ print(   np.frompyfunc(list, 1, 1)( [ [i[0], i[1]] for i in nchB] )   )
print(   np.frompyfunc(list, 1, 1)( i for i in nchB )   )

print("Test 1")
print( np.frompyfunc(list, 0, 1)(np.empty((3,2), dtype=object)) )

# print("Test 2")
# print( np.frompyfunc(list, 0, 1)(nchB) )
# print("nchB", nchB) # deleted!? nchB [list([]) list([]) list([]) list([]) list([]) list([]) list([]) list([]) list([]) list([])]

print("Test 3")
print( np.frompyfunc(list, 1, 1)( nchB ) )
#print("nchB", nchB) # OK, but does not create empty lists

print("Test 4")
nchBB = np.copy(nchB) # copy, as nchB will get deleted/changed otherwise
blist = np.frompyfunc(list, 0, 1)( nchBB ) # forces empty list, both blist and nchBB
gen = (item.extend( (nchB[ind][0], nchB[ind][1]) ) for ind, item in enumerate(blist))
for _ in gen: pass # https://stackoverflow.com/q/11539194
print("blist", blist) # blist [list([37, 7]) list([36, 7]) ...
print("nchBB", nchBB) # nchBB [list([37, 7]) list([36, 7]) ...

print("shapes:", blist.shape, nchB.shape)

... will produce:

...
Test 1
[[list([]) list([])]
 [list([]) list([])]
 [list([]) list([])]]
Test 3
[list([37, 7, 5, 0, 5, 0, 8, 0]) list([36, 7, 5, 0, 4, 0, 8, 0])
 list([37, 7, 5, 0, 4, 0, 8, 0]) list([37, 7, 5, 0, 5, 0, 9, 0])
 list([36, 7, 6, 0, 6, 0, 12, 0]) list([36, 7, 5, 0, 5, 0, 9, 0])
 list([36, 7, 5, 0, 5, 0, 8, 0]) list([37, 7, 6, 0, 6, 0, 10, 0])
 list([37, 7, 6, 0, 6, 0, 10, 0]) list([37, 7, 6, 0, 6, 0, 12, 0])]
Test 4
blist [list([37, 7]) list([36, 7]) list([37, 7]) list([37, 7]) list([36, 7])
 list([36, 7]) list([36, 7]) list([37, 7]) list([37, 7]) list([37, 7])]
nchBB [list([37, 7]) list([36, 7]) list([37, 7]) list([37, 7]) list([36, 7])
 list([36, 7]) list([36, 7]) list([37, 7]) list([37, 7]) list([37, 7])]
shapes: (10,) (10,)

So, the trick was:

Copy the np.array of lists of the source - as using an np.array as source for np.frompyfunc will change it in-place!
Make the np.frompyfunc return empty lists, for the full lists in the source np.array
Make a generator expression that loops through the np.frompyfunc empty lists, and extend those empty lists with the first two elements of the source np.array (which is now fully accessible, as it has been copied, and thus remained unchanged by the np.frompyfunc)

I kinda hoped this would be easier and/or doable with a one-liner, but there you go ... At least, the truncated nchBB, and the original nchB, now still have the same shape, from the point of view of numpy.

score 0 · Answer 2 · answered Oct 18 '19 at 14:40

You're almost there:

slice_two = np.frompyfunc(lambda x: x[:2], 1, 1)
slice_two(nchB)

# [list([37, 7]) list([36, 7]) 
#  list([37, 7]) list([37, 7]) 
#  list([36, 7]) list([36, 7]) 
#  list([36, 7]) list([37, 7]) 
#  list([37, 7]) list([37, 7])]

slice_one = np.frompyfunc(lambda x: x[:1], 1, 1)
slice_one(nchB)

# [list([37]) list([36]) 
#  list([37]) list([37]) 
#  list([36]) list([36])
#  list([36]) list([37]) 
#  list([37]) list([37])]

And this doesn't affect the original data either:

print(nchB)
# [list([37, 7, 5, 0, 5, 0, 8, 0])  list([36, 7, 5, 0, 4, 0, 8, 0])
#  list([37, 7, 5, 0, 4, 0, 8, 0])  list([37, 7, 5, 0, 5, 0, 9, 0])
#  list([36, 7, 6, 0, 6, 0, 12, 0]) list([36, 7, 5, 0, 5, 0, 9, 0])
#  list([36, 7, 5, 0, 5, 0, 8, 0])  list([37, 7, 6, 0, 6, 0, 10, 0])
#  list([37, 7, 6, 0, 6, 0, 10, 0]) list([37, 7, 6, 0, 6, 0, 12, 0])]

Obtaining numpy array of shortened lists?

2 Answers2