2

I have simulated 10000 scenarios for 4 variables during 120 months. Hence, I have a scenarios list of lists of lists on which to get and element I would have to use scenarios[1][1][1], for example, and this would give me a float.

I want to slice this in two, dividing by the second list. Which means I want to keep the 10000 scenarios for 4 variables for the first 60 months.

How would I go about doing this?

My intuition would tell me to do

scenarios[:][0:60]

but this does not work. Instead of cutting the second list, it cuts the first. What is wrong?

Example:

Q = data.cov().as_matrix()   # monthly covariance matrix Q                                                                                            
r=[0.00565,0.00206,0.00368,0.00021] # monthly return 

scenarios = [[]]*10000
for i in range(10000):
    scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios

In my case, Q=

2.167748064990633258e-03    -8.736421379048196659e-05   1.457397098602368978e-04    2.799384719379381381e-06
-8.736421379048196659e-05   9.035930360181909865e-04    3.196576120840064102e-04    3.197146643002681875e-06
1.457397098602368978e-04    3.196576120840064102e-04    2.390042779951682440e-04    2.312645986876262622e-06
2.799384719379381381e-06    3.197146643002681875e-06    2.312645986876262622e-06    4.365866475269951553e-06
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
python_enthusiast
  • 896
  • 2
  • 7
  • 26

4 Answers4

2

Use a list comprehension:

early_scenarios = [x[:60] for x in scenarios]
James Hollis
  • 176
  • 6
2

So, you are trying to use multidimensional slicing on Python list objects, but fundamentally, list objects do not have dimensions. They have no inherent knowledge of their contents, other than the total number of them. But, you *shouldn't be working with list objects at all! Instead, replace this:

scenarios = [[]]*10000
for i in range(10000):
    scenarios[i] = np.random.multivariate_normal(r, Q, size = 120) # monthly scenarios

With this:

scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))

In a REPL:

>>> scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))
>>> scenarios.shape
(1000, 120, 4)

Then, you can slice to your heart's content in N dimensions using:

scenarios[:, 0:60]

Or, a more wieldy slice:

>>> scenarios[500:520, 0:60]
array([[[-0.05785267,  0.01122828,  0.00786622, -0.00204875],
        [ 0.01682276,  0.00163375,  0.00439909, -0.0022255 ],
        [ 0.02821342, -0.01634708,  0.01175085, -0.00194007],
        ...,
        [ 0.04918003, -0.02146014,  0.00071328, -0.00222226],
        [-0.03782566, -0.00685615, -0.00837397, -0.00095019],
        [-0.06164655,  0.02817698,  0.01001757, -0.00149662]],

       [[ 0.00071181, -0.00487313, -0.01471801, -0.00180559],
        [ 0.05826763,  0.00978292,  0.02442642, -0.00039461],
        [ 0.04382627, -0.00804489,  0.00046985,  0.00086524],
        ...,
        [ 0.01231702,  0.01872649,  0.01534518, -0.0022179 ],
        [ 0.04212831, -0.05289387, -0.03184881, -0.00078165],
        [-0.04361605, -0.01297212,  0.00135886,  0.0057856 ]],

       [[ 0.00232622,  0.01773357,  0.00795682,  0.00016406],
        [-0.04367355, -0.02387383, -0.00448453,  0.0008559 ],
        [ 0.01256918,  0.06565425,  0.05170755,  0.00046948],
        ...,
        [ 0.04457427, -0.01816762,  0.00068176,  0.00186112],
        [ 0.00220281, -0.01119046,  0.0103347 , -0.00089715],
        [ 0.02178122,  0.03183001,  0.00959293, -0.00057862]],

       ...,
       [[ 0.06338153,  0.01641472,  0.01962643, -0.00256244],
        [ 0.07537754, -0.0442643 , -0.00362656,  0.00153777],
        [ 0.0505006 ,  0.0070783 ,  0.01756948,  0.0029576 ],
        ...,
        [ 0.03524508, -0.03547517, -0.00664972, -0.00095385],
        [-0.03699107,  0.02256328,  0.00300107,  0.00253193],
        [-0.0199608 , -0.00536222,  0.01370301, -0.00131981]],

       [[ 0.08601913, -0.00364473,  0.00946769,  0.00045275],
        [ 0.01943327,  0.07420857,  0.00109217, -0.00183334],
        [-0.04481884, -0.02515305, -0.02357894, -0.00198166],
        ...,
        [-0.01221928, -0.01241903,  0.00928084,  0.00066379],
        [ 0.10871802, -0.01264407,  0.00601223,  0.00090526],
        [-0.02603179, -0.00413112, -0.006037  ,  0.00522712]],

       [[-0.02929114,  0.02188803, -0.00427137,  0.00250174],
        [ 0.02479416, -0.01470632, -0.01355196,  0.00338125],
        [-0.01915726, -0.00869161,  0.01451885, -0.00137969],
        ...,
        [ 0.05398784, -0.00834729, -0.00437888,  0.00081602],
        [ 0.00626345, -0.0261016 , -0.01484753,  0.00060499],
        [ 0.05427697,  0.04006612,  0.03371313, -0.00203731]]])
>>>
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • I get the TypeError: list indices must be integers or slices, not tuple – python_enthusiast Nov 07 '17 at 23:50
  • 1
    @python_newbie you are still using a `list` somehow. Make sure you do `scenarios = np.random.multivariate_normal(r, Q, size=(1000, 120))` – juanpa.arrivillaga Nov 07 '17 at 23:53
  • 1
    @python_newbie as in, that replaces your whole `scenarios = [[]]*1000; for i in range(10000): ...` loop – juanpa.arrivillaga Nov 07 '17 at 23:54
  • 1
    @python_newbie If you have a list of lists of lists, you can turn that straight into a Numpy array by doing `scenarios = np.array(scenarios)`. Then you can use the `scenarios[:, 0:60]` expression for cool numpy slicing. – James Hollis Nov 07 '17 at 23:58
  • @JamesHollis yes, but it would be *best* to avoid that pointlessly inefficient loop and take full-advantage of a vectorized numpy function. – juanpa.arrivillaga Nov 08 '17 at 00:00
  • @juanpa.arrivillaga My intention here is to give the asker a way to try it out quickly, so he does not give up and leave the green tick on my answer. – James Hollis Nov 08 '17 at 00:05
  • @JamesHollis yes, well, while I appreciate that, I'm very concerned with making sure OP understands the distinction between arrays and lists. It's my own little personal crusade on this tag :) – juanpa.arrivillaga Nov 08 '17 at 00:09
  • @juanpa.arrivillaga This morning I changed my whole code to turn everything into numpy arrays, but they seem to be very memory consuming and my computer keeps freezing. Is this something normal given the size of the data? What kind of structure would be easier to run? – python_enthusiast Nov 08 '17 at 18:01
  • @python_newbie `numpy` arrays *are significantly* more memory efficient than `list` objects.`float` objects are just that, in Python, full-fledged objects that take 24 bytes (depending on your version and system) per instance. A Python list is, underneath the hood, an array of *pointers to Py_Objects*, so each element in the array takes 8 bytes (4 if you are on a 32 bit system). So, for example, an 100 `float` objects in a list take about `(24 + 8)*100 == 3200` bytes. A `np.float64` array would eb `100*8 = 800` bytes... You can ask another question regarding your specific memory issue. – juanpa.arrivillaga Nov 08 '17 at 18:08
  • @python_newbie the memory consumption for the Numpy array should be 10000*120*4*8 = 38.4MB. You might have more than one array, but you shouldn't be running out of memory here. Try stepping through in a debugger to see which lines cause your memory consumption to get out of hand. – James Hollis Nov 08 '17 at 23:36
  • 1
    I spent the day rewritting my code and somehow the problem was fixed. The memory problem is gone, even though I don't know what I did that fixed it... – python_enthusiast Nov 09 '17 at 00:00
  • @python_newbie welp, all's well that ends well, I suppose. – juanpa.arrivillaga Nov 09 '17 at 00:02
1

Python slicing doesn't consider all dimension like this. Your expression makes a copy of the entire list, scenarios[:], and then takes the first 60 elements of the copy. You need to write a comprehension to grab the elements you want. Perhaps

[scenarios[x][y][z] 
    for x in range(len(scenarios))
        for y in range(60)
            for z in range(len(scenarios[0][0])) ]
python_enthusiast
  • 896
  • 2
  • 7
  • 26
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Good answer, but still can be simplified. – Mark Nov 07 '17 at 23:18
  • 1
    I know; I try to keep the answer accessible to the poster's level of learning. If you want to edit an addendum to this, be my guest. – Prune Nov 07 '17 at 23:41
1

You need to explicitly slice each secondary list, either in a loop or in list comprehensions. I built a 10x10 set of lists so you have to change the indexing to fit your problem:

x = []
for a in range(10):
    x.append([10*a+n for n in range(10)])
# x is now a list of 10 lists, each of which has 10 elements
print(x)
x1 = [a[:5] for a in x]
# x1 is a list of containing the low elements of the secondary lists
x2 = [a[5:] for a in x]
# x2 is a list containing the high elements of the secondary lists
print(x1, x2)
Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24