1

I just started python three days ago and I am already facing a problem. I couldn't get any information in the www. It looks like a bug - but I think I did s.th. wrong. However I can't find the problem.

Here we go: I have 1 List called "inputData". So all I do is, take out the first 10 entries in each array, fit it with polyfit, save the fit parameters in the variable "linFit" and afterwards substract the fit from my "inputData" and save it in a new list called "correctData". The print line is only to show you the "bug".

If you run the code below and you compare the "inputData" print before and after the procedure, it is different. I have no idea, why... :( However, if you remove one of the two arrays in "inputData", it works fine. Anyone any idea? Thx!

import matplotlib.pyplot as plt
import pylab as np                        

inputData = [np.array([[  1.06999998e+01,   1.71811953e-01],
       [ 2.94000015e+01,  2.08369687e-01],
       [  3.48000002e+01,   3.70725733e-01],
       [  4.28000021e+01,   4.96874842e-01],
       [  5.16000004e+01,  5.20280702e-01],
       [  6.34000015e+01,  6.79658073e-01],
       [  7.72000008e+01,  7.15826614e-01],
       [  8.08000031e+01,   8.38463318e-01],
       [  9.27000008e+01,   9.07969677e-01],
       [  10.65000000e+01,  10.76921320e-01],
       [  11.65000000e+01,  11.76921320e-01]]), 
np.array([[ 0.25999999e+00,   1.21419430e-01],
       [  1.84000009e-01,  2.26843166e-01],
       [ 2.41999998e+01,  3.69826150e-01],
       [  3.90000000e+01,   4.12130547e-01],
       [  4.20999985e+01,  5.92435598e-01],
       [  5.22999992e+01,   6.44819438e-01],
       [  6.62999992e+01,  7.23920727e-01],
       [  7.65000000e+01,   8.45791912e-01],
       [  8.22000008e+01,   9.97368264e-01],
       [  9.55000000e+01,  10.48223877e-01]])]

linFit = [['', '']]*15                    
linFitData = [['', '']]*15              
correctData = np.copy(inputData)          

print(inputData)  

for i, entry in enumerate(inputData):
    CUT = np.split(entry, [10], axis=0)                           
    linFitData[i] = CUT[0]                                                
    linFit[i] = np.polyfit(linFitData[i][:,0], linFitData[i][:,1], 1)
    for j, subentry in enumerate(entry):      
        correctData[i][j][1] = subentry[1]-subentry[0]*(linFit[i][0]+linFit[i][1])  
        #print (inputData[0][0][1])
    print('----------')    

print(inputData)     

for i, entry in enumerate(inputData):                                     
    plt.plot(entry[:,0], entry[:,1], '.')    
    plt.plot(linFitData[i][:,0], (linFitData[i][:,0])*(linFit[i][0])+(linFit[i][1]))  
    #plt.plot(correctData[i][:,0], correctData[i][:,1], '.')  
  • 1
    It's hard for me to make heads or tails of what inputData, correctData, and dataList are. From what you've posted, I don't know what I'm expecting to look like what. I would be very helpful if you were able to post a complete (runnable) example that showed the problem. http://stackoverflow.com/help/mcve – Mike Graham Feb 27 '14 at 15:41
  • Hello Mike, I just changed my code, so you can execute it. If you compare the two "inputData" prints you will see they are different. :( – user3361064 Feb 27 '14 at 18:54

2 Answers2

1

Your inputData isn't a numpy array, it's a list of arrays. Those two lists don't have the same length:

>>> [len(sl) for sl in inputData]
[11, 10]

numpy arrays can't handle varying lengths. If you try to make an array out of it, instead of having a 2-D array of float dtype, you get a 1-D array of object dtype, the members of which are lists:

>>> a = np.array(inputData)
>>> a.shape, a.dtype
((2,), dtype('O'))

and so your "copy" is actually only a shallow copy; the lists inside are the same objects as in inputData:

>>> correctData = np.copy(inputData)
>>> inputData[0] is correctData[0]
True
>>> inputData[1] is correctData[1]
True

BTW, you can't multiply lists like this linFit = [['', '']]*15; that doesn't make a copy either (see here). linFit[0] is linFit[1] -- try changing one of the sublists to see this.

Community
  • 1
  • 1
DSM
  • 342,061
  • 65
  • 592
  • 494
0

Your code as you posted it is not runnable at all, as a bunch of definitions are missing or wrong. After fixing this and some code cleanup, I get the following, which basically shows, everything is working as intended:

import numpy as np
from copy import deepcopy

dataList = [np.array([[  1.06999998e+01,   1.71811953e-01],
       [ -3.94000015e+01,  -7.08369687e-02],
       [  1.48000002e+01,   1.70725733e-02],
       [  6.28000021e+00,   1.96874842e-01],
       [  2.16000004e+01,  -1.20280702e-02],
       [  4.34000015e+01,  -3.79658073e-01],
       [  3.72000008e+01,  -1.15826614e-01],
       [  8.08000031e+01,   6.38463318e-01],
       [  5.27000008e+01,   5.07969677e-01],
       [  6.65000000e+01,  -4.76921320e-01]], dtype=np.float32), 
np.array([[ -3.25999999e+00,   1.21419430e-01],
       [  2.84000009e-01,  -4.26843166e-02],
       [ -1.41999998e+01,  -1.69826150e-01],
       [  1.90000000e+01,   2.12130547e-01],
       [  3.20999985e+01,  -5.92435598e-02],
       [  3.22999992e+01,   1.44819438e-01],
       [  3.62999992e+01,  -3.23920727e-01],
       [  4.65000000e+01,   2.45791912e-01],
       [  6.22000008e+01,   1.97368264e-02],
       [  6.55000000e+01,  -1.48223877e-01]], dtype=np.float32)]

correctData = deepcopy(dataList)           

for i, entry in enumerate(dataList):
    CUT = np.split(entry, 5, axis=0)[0]                         
    linFit = np.polyfit(CUT[:,0], CUT[:,1], 1)   
    for j, subentry in enumerate(entry):      
        correctData[i][j][1] = subentry[1] - subentry[0] * linFit[0] + linFit[1]
        print dataList[1][0][1]
    print('----------')

Outputs:

0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
----------
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
0.121419
----------

The actual problem in your code above is, that inputData is of type list. When you create the correctData, if it would be an array, it would be a nice copy. But as it is a list, the copy creates an array of objects, which holds only references to the original arrays. So in fact, you're directly writing to inputData, not to copies. See that:

correctData.dtype
>>> dtype('O')

So either you create a list of copies, or you switch to a 3D-arrays, to fix the problem. To create a list with copies of all contained items, use this:

from copy import deepcopy
correctData = deepcopy(inputData)          
Michael
  • 7,316
  • 1
  • 37
  • 63
  • Hello Michael! Thank you very much for your help and for the great hints in your code! I just updated my code according to your code and changed it, so you can execute it. However, the "bug" still occures. Would be great, if you could help me out. – user3361064 Feb 27 '14 at 18:52
  • I added some explanation, what exactly went wrong in the answer. – Michael Feb 27 '14 at 20:43