Using dictionary keys in pandas dataframe columns

Question

I wrote the following code in which I create a dictionary of pandas dataframes:

import pandas as pd
import numpy as np

classification = pd.read_csv('classification.csv')

thresholdRange = np.arange(0, 70, 0.5).tolist()

classificationDict = {}

for t in thresholdRange:
    classificationDict[t] = classification

for k, v in classificationDict.iteritems():
    v ['Threshold'] = k

In this case, I want to create a column called 'Threshold' in all the pandas dataframes in which the keys of the dictionary are the values. However, what I get with the code above is the same value in all dataframes. What am I missing here? Perhaps I am complicating things for myself with this approach, but I'd greatly appreciate your help.

are you sure this code runs? You import `numpy` and then use arange without `np.`? And `tolist()` probably needs the parentheses? — Ilja, Mar 19 '17 at 10:25
Thanks @Ilja. I just edited the question. I typed this on my smartphone and I missed those important details. — ropolo, Mar 19 '17 at 10:31
Well, you should have waited until you are in front of your computer - or is it _so_ urgent ;) The code example should be a minimal working example. When you try to create one, you'll often find the issue yourself. — Ilja, Mar 19 '17 at 10:37

score 1 · Answer 1 · edited May 23 '17 at 12:00

Sorry, I got your question wrong. Now this is the issue:

Obviously, classification (a pandas dataframe, I suppose) is a mutable object, and adding a mutable object to a list or a dict makes strange (for python-beginners) behaviour. The same object is added. If you change one of the list entries, all get changed. Try this:

a = [1]
b = [a, a]
b[0] = 2
print(b[1])

This is what happens to your dict. You have to add different objects to the dict. Probably the dataframe has a .copy()-method to do this. Alternatively, I found this post for you, with (in essence) the same problem, there are further solutions there:
https://stackoverflow.com/a/2612815/6053327

I confirm, if you add .copy() at the end of classificationDict[t] = classification the code works. — Matteo Felici, Mar 20 '17 at 09:25

score 0 · Answer 2 · answered Mar 19 '17 at 10:34

0

Of course you get the same value. You are doing the same assignment over and over again in

for k, v in classificationDict.iteritems():

because your vs are all identical, you assigned them in the first for
Did you try debugging yourself, and print classification? I assume that it is only the first line?

answered Mar 19 '17 at 10:34

Ilja

2,024
12
28

Thanks for your answer @Ilja. I did find an answer on how to achieve a similar result with `awk`: http://stackoverflow.com/questions/42891531/adding-column-to-csv-file-with-awk-using-number-sequence. Do you have any advice on how to approach this in Python? – ropolo Mar 20 '17 at 00:44
wait, you want to have 140 files? each with a column with identical entries? Then I misunderstood your question, i thought you want to have the sequence inside a column... sorry :( I will adjust a little – Ilja Mar 20 '17 at 08:02
I added a new answer, I think, I will delete this, since it does not address your question... – Ilja Mar 20 '17 at 08:36

Using dictionary keys in pandas dataframe columns

2 Answers2