2

I'm having some issues importing data from file in Python. I am quite new to Python, so my error is probably quite simple.

I am reading in 3 column, tab-delimited text files with no headers. I am creating 3 instances of the data file using three different datafiles.

I can see that each object is referencing a different memory location, so they are separate.

When I look at the data stored in each instance, each instance has the same contents, consisting of the three datafiles appended to each other.

What have I done wrong?

The class to read in the data is:

class Minimal:

    def __init__(self, data=[]):
        self.data = data

    def readFile(self, filename):
        f = open(filename, 'r')

        for line in f:
            line = line.strip()
            columns = line.split()
            #creates a list of angle, intensity and error and appends it to the diffraction pattern
            self.data.append( [float(columns[0]), float(columns[1]), float(columns[2])] )
        f.close()

    def printData(self):
        for dataPoint in self.data:
            print str(dataPoint)

The datafiles look like:

1   4   2
2   5   2.3
3   4   2
4   6   2.5
5   8   5
6   10  3

The program I am using to actually create the instances of Minimal is:

from minimal import Minimal

d1 = Minimal()
d1.readFile("data1.xye")

d2 = Minimal()
d2.readFile("data2.xye")

d3 = Minimal()
d3.readFile("data3.xye")


print "Data1"
print d1
d1.printData()

print "\nData2"
print d2
d2.printData()

print "\nData3"
print d3
d3.printData()

The output is:

Data1
<minimal.Minimal instance at 0x016A35F8>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]

Data2
<minimal.Minimal instance at 0x016A3620>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]

Data3
<minimal.Minimal instance at 0x016A3648>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]

Tool completed successfully
masher
  • 3,814
  • 4
  • 31
  • 35
  • 1
    `def __init__(self, data=[]):` <- The curse of the [mutable default argument](http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument) strikes again! – DSM Aug 12 '13 at 02:38
  • Can you post the contents of the 3 files? It would be very helpful. – Mario Rossi Aug 12 '13 at 02:42
  • @MarioRossi : The files have the same 2nd and 3rd columns. The 1st columns start with 1, 2 or 3 and go up in steps of 1 for 6 rows – masher Aug 12 '13 at 02:46
  • Wouldn't the [`csv`](http://docs.python.org/2/library/csv.html) module be appropriate for parsing the file? Doesn't solve the problem, but it would be cleaner code, I think. – jpmc26 Aug 12 '13 at 03:34
  • @jpmc26 quite possibly, but I am trying to teach myself Python, so I did it this way, and the files aren't that complex... – masher Aug 12 '13 at 04:03

2 Answers2

5

Default value data is evaluated only once; data attributes of Minimal instances reference the same list.

>>> class Minimal:
...     def __init__(self, data=[]):
...         self.data = data
... 
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
True

Replace as follow:

>>> class Minimal:
...     def __init__(self, data=None):
...         self.data = data or []
... 
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
False

See “Least Astonishment” in Python: The Mutable Default Argument.

Community
  • 1
  • 1
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • Well stuff me. I thought that data would be shared only if it was defined outside of the __init__ – masher Aug 12 '13 at 02:46
  • It *is* defined outside of the init! Default values are evaluated on function *definition*, not on function invocation. The explanation is too long for a comment so I will add an answer. – Mario Rossi Aug 12 '13 at 03:07
  • I just realized that the root of the problem is **not** that the "`data` attribute is shared by all instance". This would make very little sense, in fact. The root cause is that its **default value in __init__** is shared. – Mario Rossi Aug 12 '13 at 03:23
1

Consider the following:

def d():
   print("d() invoked")
   return 1

def f(p=d())
   pass

print"("Start")
f()
f()

It prints

d() invoked
Start

Not

Start
d() invoked
d() invoked

Why? Because default arguments are computed on function definition (and stored in some kind of internal global for reuse every subsequent time they are needed). They are not computed on each function invocation.

In other words, they behave more or less like:

_f_p_default= d()
def f(p)
   if p is None: p= _f_p_default
   pass

Make the above substitution in your code, and you will understand the problem immediately.

The correct form for your code was already provided by @falsetru . I'm just trying to explain the rationale.

Mario Rossi
  • 7,651
  • 27
  • 37
  • Thanks for that. My background is all in Java, so the different paradigms are still conflicting with me. (I still feel dirty not giving my variables a type on definition!). – masher Aug 12 '13 at 04:05
  • @masher You are obviously no beginner. That's why I wanted to explain things a bit deeper. I can't wait optional static type checking is added to Python, either. Not only because I miss it, but because it's very useful, especially in larger scale applications. – Mario Rossi Aug 12 '13 at 06:02