List of List of List slicing in Python

Question

I have simulated 10000 scenarios for 4 variables during 120 months. Hence, I have a scenarios list of lists of lists on which to get and element I would have to use scenarios[1][1][1], for example, and this would give me a float.

I want to slice this in two, dividing by the second list. Which means I want to keep the 10000 scenarios for 4 variables for the first 60 months.

How would I go about doing this?

My intuition would tell me to do

scenarios[:][0:60]

but this does not work. Instead of cutting the second list, it cuts the first. What is wrong?

Example:

Q = data.cov().as_matrix()   # monthly covariance matrix Q                                                                                            
r=[0.00565,0.00206,0.00368,0.00021] # monthly return 

scenarios = [[]]*10000
for i in range(10000):
    scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios

In my case, Q=

2.167748064990633258e-03    -8.736421379048196659e-05   1.457397098602368978e-04    2.799384719379381381e-06
-8.736421379048196659e-05   9.035930360181909865e-04    3.196576120840064102e-04    3.197146643002681875e-06
1.457397098602368978e-04    3.196576120840064102e-04    2.390042779951682440e-04    2.312645986876262622e-06
2.799384719379381381e-06    3.197146643002681875e-06    2.312645986876262622e-06    4.365866475269951553e-06

@GiantsLoveDeathMetal not entirely. `[:]` is creating a shallow copy of `scenarios` — Ajax1234, Nov 07 '17 at 23:15
see: https://stackoverflow.com/questions/17277100/python-slicing-a-multi-dimensional-array — Meiko Rachimow, Nov 07 '17 at 23:16
@python_newbie please post a small sample of your data and your desired output from that sample. — Ajax1234, Nov 07 '17 at 23:17
Note, if your data is numeric, you could use `numpy` arrays which *do* support this sort of slicing! it would be `scenarios[:,:60]` — juanpa.arrivillaga, Nov 07 '17 at 23:18
My data is indeed numeric. Can I just use this structure directly? — python_enthusiast, Nov 07 '17 at 23:20
`scenarios[:len(scenarios)//2], scenarios[len(scenarios)//2:]` cuts your data into two equal halves, regardless of whether each element of scenarios is a simple float, or a list, or a list of lists. — Paul Cornelius, Nov 07 '17 at 23:23
@PaulCornelius yes, but this cuts in the first dimension, so it goes from 10000 to 5000, but not from 120 to 60. — python_enthusiast, Nov 07 '17 at 23:25
@python_newbie if you can add a small example of the sort of data you are working with, I can add an answer that demonstrates how to use `numpy`, potentially. — juanpa.arrivillaga, Nov 07 '17 at 23:27
@juanpa.arrivillaga Sorry that it took me a while. Kernel died... — python_enthusiast, Nov 07 '17 at 23:35
@python_newbie wait... you were *already* using `numpy`! OK, check out my answer... — juanpa.arrivillaga, Nov 07 '17 at 23:44
In other words, you don't have a list of list of lists, you have a list of two-dimensional arrays. It's very, *very* important that you keep `list`s and arrays distinct. They are two different data-structures with different ideal use-cases — juanpa.arrivillaga, Nov 07 '17 at 23:46

score 2 · Accepted Answer · answered Nov 07 '17 at 23:25

2

Use a list comprehension:

early_scenarios = [x[:60] for x in scenarios]

answered Nov 07 '17 at 23:25

James Hollis

176
6

This is both perfect and elegant. You get the Pythonic award. – python_enthusiast Nov 07 '17 at 23:38
Actually if you're using Numpy you should probably use Numpy to do this. You might want to hold off on that green tick and see if juanpa.arrivallaga comes through. – James Hollis Nov 07 '17 at 23:44

juanpa.arrivillaga · Answer 2 · 2017-11-08T00:01:23.763

2

So, you are trying to use multidimensional slicing on Python list objects, but fundamentally, list objects do not have dimensions. They have no inherent knowledge of their contents, other than the total number of them. But, you *shouldn't be working with list objects at all! Instead, replace this:

scenarios = [[]]*10000
for i in range(10000):
    scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios

With this:

scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))

In a REPL:

>>> scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))
>>> scenarios.shape
(1000, 120, 4)

Then, you can slice to your heart's content in N dimensions using:

scenarios[:, 0:60]

Or, a more wieldy slice:

>>> scenarios[500:520, 0:60]
array([[[-0.05785267,  0.01122828,  0.00786622, -0.00204875],
        [ 0.01682276,  0.00163375,  0.00439909, -0.0022255 ],
        [ 0.02821342, -0.01634708,  0.01175085, -0.00194007],
        ...,
        [ 0.04918003, -0.02146014,  0.00071328, -0.00222226],
        [-0.03782566, -0.00685615, -0.00837397, -0.00095019],
        [-0.06164655,  0.02817698,  0.01001757, -0.00149662]],

       [[ 0.00071181, -0.00487313, -0.01471801, -0.00180559],
        [ 0.05826763,  0.00978292,  0.02442642, -0.00039461],
        [ 0.04382627, -0.00804489,  0.00046985,  0.00086524],
        ...,
        [ 0.01231702,  0.01872649,  0.01534518, -0.0022179 ],
        [ 0.04212831, -0.05289387, -0.03184881, -0.00078165],
        [-0.04361605, -0.01297212,  0.00135886,  0.0057856 ]],

       [[ 0.00232622,  0.01773357,  0.00795682,  0.00016406],
        [-0.04367355, -0.02387383, -0.00448453,  0.0008559 ],
        [ 0.01256918,  0.06565425,  0.05170755,  0.00046948],
        ...,
        [ 0.04457427, -0.01816762,  0.00068176,  0.00186112],
        [ 0.00220281, -0.01119046,  0.0103347 , -0.00089715],
        [ 0.02178122,  0.03183001,  0.00959293, -0.00057862]],

       ...,
       [[ 0.06338153,  0.01641472,  0.01962643, -0.00256244],
        [ 0.07537754, -0.0442643 , -0.00362656,  0.00153777],
        [ 0.0505006 ,  0.0070783 ,  0.01756948,  0.0029576 ],
        ...,
        [ 0.03524508, -0.03547517, -0.00664972, -0.00095385],
        [-0.03699107,  0.02256328,  0.00300107,  0.00253193],
        [-0.0199608 , -0.00536222,  0.01370301, -0.00131981]],

       [[ 0.08601913, -0.00364473,  0.00946769,  0.00045275],
        [ 0.01943327,  0.07420857,  0.00109217, -0.00183334],
        [-0.04481884, -0.02515305, -0.02357894, -0.00198166],
        ...,
        [-0.01221928, -0.01241903,  0.00928084,  0.00066379],
        [ 0.10871802, -0.01264407,  0.00601223,  0.00090526],
        [-0.02603179, -0.00413112, -0.006037  ,  0.00522712]],

       [[-0.02929114,  0.02188803, -0.00427137,  0.00250174],
        [ 0.02479416, -0.01470632, -0.01355196,  0.00338125],
        [-0.01915726, -0.00869161,  0.01451885, -0.00137969],
        ...,
        [ 0.05398784, -0.00834729, -0.00437888,  0.00081602],
        [ 0.00626345, -0.0261016 , -0.01484753,  0.00060499],
        [ 0.05427697,  0.04006612,  0.03371313, -0.00203731]]])
>>>

edited Nov 08 '17 at 00:01

answered Nov 07 '17 at 23:44

juanpa.arrivillaga

88,713
10
131
172

I get the TypeError: list indices must be integers or slices, not tuple – python_enthusiast Nov 07 '17 at 23:50
1

@python_newbie you are still using a `list` somehow. Make sure you do `scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))` – juanpa.arrivillaga Nov 07 '17 at 23:53
1

@python_newbie as in, that replaces your whole `scenarios = [[]]*1000; for i in range(10000): ...` loop – juanpa.arrivillaga Nov 07 '17 at 23:54
1

@python_newbie If you have a list of lists of lists, you can turn that straight into a Numpy array by doing `scenarios = np.array(scenarios)`. Then you can use the `scenarios[:, 0:60]` expression for cool numpy slicing. – James Hollis Nov 07 '17 at 23:58
@JamesHollis yes, but it would be *best* to avoid that pointlessly inefficient loop and take full-advantage of a vectorized numpy function. – juanpa.arrivillaga Nov 08 '17 at 00:00
@juanpa.arrivillaga My intention here is to give the asker a way to try it out quickly, so he does not give up and leave the green tick on my answer. – James Hollis Nov 08 '17 at 00:05
@JamesHollis yes, well, while I appreciate that, I'm very concerned with making sure OP understands the distinction between arrays and lists. It's my own little personal crusade on this tag :) – juanpa.arrivillaga Nov 08 '17 at 00:09
@juanpa.arrivillaga This morning I changed my whole code to turn everything into numpy arrays, but they seem to be very memory consuming and my computer keeps freezing. Is this something normal given the size of the data? What kind of structure would be easier to run? – python_enthusiast Nov 08 '17 at 18:01
@python_newbie `numpy` arrays *are significantly* more memory efficient than `list` objects.`float` objects are just that, in Python, full-fledged objects that take 24 bytes (depending on your version and system) per instance. A Python list is, underneath the hood, an array of *pointers to Py_Objects*, so each element in the array takes 8 bytes (4 if you are on a 32 bit system). So, for example, an 100 `float` objects in a list take about `(24 + 8)*100 == 3200` bytes. A `np.float64` array would eb `100*8 = 800` bytes... You can ask another question regarding your specific memory issue. – juanpa.arrivillaga Nov 08 '17 at 18:08
@python_newbie the memory consumption for the Numpy array should be 10000*120*4*8 = 38.4MB. You might have more than one array, but you shouldn't be running out of memory here. Try stepping through in a debugger to see which lines cause your memory consumption to get out of hand. – James Hollis Nov 08 '17 at 23:36
1

I spent the day rewritting my code and somehow the problem was fixed. The memory problem is gone, even though I don't know what I did that fixed it... – python_enthusiast Nov 09 '17 at 00:00
@python_newbie welp, all's well that ends well, I suppose. – juanpa.arrivillaga Nov 09 '17 at 00:02

score 1 · Answer 3 · edited Nov 07 '17 at 23:39

1

Python slicing doesn't consider all dimension like this. Your expression makes a copy of the entire list, scenarios[:], and then takes the first 60 elements of the copy. You need to write a comprehension to grab the elements you want. Perhaps

[scenarios[x][y][z] 
    for x in range(len(scenarios))
        for y in range(60)
            for z in range(len(scenarios[0][0])) ]

edited Nov 07 '17 at 23:39

python_enthusiast

896
2
7
26

answered Nov 07 '17 at 23:16

Prune

76,765
14
60
81

Good answer, but still can be simplified. – Mark Nov 07 '17 at 23:18
1

I know; I try to keep the answer accessible to the poster's level of learning. If you want to edit an addendum to this, be my guest. – Prune Nov 07 '17 at 23:41

score 1 · Answer 4 · answered Nov 07 '17 at 23:38

You need to explicitly slice each secondary list, either in a loop or in list comprehensions. I built a 10x10 set of lists so you have to change the indexing to fit your problem:

x = []
for a in range(10):
    x.append([10*a+n for n in range(10)])
# x is now a list of 10 lists, each of which has 10 elements
print(x)
x1 = [a[:5] for a in x]
# x1 is a list of containing the low elements of the secondary lists
x2 = [a[5:] for a in x]
# x2 is a list containing the high elements of the secondary lists
print(x1, x2)

List of List of List slicing in Python

4 Answers4