-1

I am relatively new to Python (worked in R for awhile), and I feel that I am fundamentally misunderstanding something here about Python. Below is a minimal reproducible example, and in it, I would like the data frame with each integer as a variable. The function I write below will output only the "2" integer as a variable. If I tab the "return df" then I get the "0" integer as a variable and its contents as observations. If I use print and tab it, so that it occurs under the "df," I get what I want, but it's not in a data frame. Can someone explain what is going on here?

Expected output would be:

enter image description here

d = {0: ([1.5, 2.3, 4.5]), 1: ([5.6, 2.4,  4.4]), 2: ([3.5,  3.4,  5])}

def classify(z):
    for i in z:
        df = pd.DataFrame({i: z[i]})
    return df
    
classify(d)
James
  • 459
  • 2
  • 14
  • First, you're reassigning the value of `df` by doing `df = ...`, so the value of `df` will only ever be the last item assigned. Second, can you please provide an example of your expected output? Generally you can pass a dictionary directly to the DataFrame constructor. – ddejohn Mar 01 '22 at 19:04
  • Just added to post. – James Mar 01 '22 at 19:13
  • I’m voting to close this question because it is directly and explicitly answered by the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). – ddejohn Mar 01 '22 at 19:52
  • It's really about the second part of Rabinzel's answer to gain a deeper understanding of what is going on then about the fact that you can easily convert dictionaries to a data frame, but vote as you wish. – James Mar 01 '22 at 20:32

1 Answers1

1

If you pass a dictionary to the DataFrame, the keys will be the name of each column with the values of the dictionary in it. In your example:

df = pd.DataFrame(d)

Output:
    0   1   2
0   1.5 5.6 3.5
1   2.3 2.4 3.4
2   4.5 4.4 5.0

To get the desired output you could do the following things:

#1 best
df = pd.DataFrame.from_dict(d, orient='index')

#2 only pass the dictionary values to df 
df = pd.DataFrame(d.values()) 

#3
df = pd.DataFrame(d).T

I think you created the dict just for this question, it is not quite clear, but you can pass dicts or lists/tuples directly to the df as ddejohn already mentioned. In your code you don't update the dict, you define it everytime as new df, so in the end the df contains only data of the last item assigned.

Edit to your question:

look here. it is highly recommended not to do it the way you want to.

Have a look at the official pandas DataFrame documentation. I think things are much more clear after that. But since you asked, if you would fill your df in a loop, I think that's the easiest way to go:

df = pd.DataFrame()
for k,v in d.items():
    df[k] = v
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • Ah, I didn't know you could pass a dictionary directly into a dataframe and get that. I would I update the dataframe with each new loop? Would I have to use iterrows or something like that to update the df in the forloop? – James Mar 01 '22 at 19:41
  • James, it seems like you may be lacking some Pandas fundamentals. I recommend looking for some tutorials on DataFrame basics, like creating DFs, modifying DFs, etc. Generally speaking, you should stop and think twice if you ever find yourself wanting/needing to iterate over a DataFrame. – ddejohn Mar 01 '22 at 19:47