0

I wrote the following code in which I create a dictionary of pandas dataframes:

import pandas as pd
import numpy as np

classification = pd.read_csv('classification.csv')

thresholdRange = np.arange(0, 70, 0.5).tolist()

classificationDict = {}

for t in thresholdRange:
    classificationDict[t] = classification

for k, v in classificationDict.iteritems():
    v ['Threshold'] = k

In this case, I want to create a column called 'Threshold' in all the pandas dataframes in which the keys of the dictionary are the values. However, what I get with the code above is the same value in all dataframes. What am I missing here? Perhaps I am complicating things for myself with this approach, but I'd greatly appreciate your help.

ropolo
  • 117
  • 1
  • 1
  • 8
  • are you sure this code runs? You import `numpy` and then use arange without `np.`? And `tolist()` probably needs the parentheses? – Ilja Mar 19 '17 at 10:25
  • Thanks @Ilja. I just edited the question. I typed this on my smartphone and I missed those important details. – ropolo Mar 19 '17 at 10:31
  • Well, you should have waited until you are in front of your computer - or is it _so_ urgent ;) The code example should be a minimal working example. When you try to create one, you'll often find the issue yourself. – Ilja Mar 19 '17 at 10:37
  • Thanks for your answer. I will try to figure out the issue. – ropolo Mar 19 '17 at 19:47

2 Answers2

1

Sorry, I got your question wrong. Now this is the issue:

Obviously, classification (a pandas dataframe, I suppose) is a mutable object, and adding a mutable object to a list or a dict makes strange (for python-beginners) behaviour. The same object is added. If you change one of the list entries, all get changed. Try this:

a = [1]
b = [a, a]
b[0] = 2
print(b[1])

This is what happens to your dict. You have to add different objects to the dict. Probably the dataframe has a .copy()-method to do this. Alternatively, I found this post for you, with (in essence) the same problem, there are further solutions there:
https://stackoverflow.com/a/2612815/6053327

Community
  • 1
  • 1
Ilja
  • 2,024
  • 12
  • 28
0

Of course you get the same value. You are doing the same assignment over and over again in

for k, v in classificationDict.iteritems():

because your vs are all identical, you assigned them in the first for
Did you try debugging yourself, and print classification? I assume that it is only the first line?

Ilja
  • 2,024
  • 12
  • 28
  • Thanks for your answer @Ilja. I did find an answer on how to achieve a similar result with `awk`: http://stackoverflow.com/questions/42891531/adding-column-to-csv-file-with-awk-using-number-sequence. Do you have any advice on how to approach this in Python? – ropolo Mar 20 '17 at 00:44
  • wait, you want to have 140 files? each with a column with identical entries? Then I misunderstood your question, i thought you want to have the sequence inside a column... sorry :( I will adjust a little – Ilja Mar 20 '17 at 08:02
  • I added a new answer, I think, I will delete this, since it does not address your question... – Ilja Mar 20 '17 at 08:36