1

I want to store a DataFrame object as a value of the column of a row: Here's a simplified analogy of what I want to achieve.

>>> df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))
>>> df    
166:    D  E  F
     0  1  2  3
     1  2  4  6

I created a new DataFrame and add a new column on the go as I insert the new DataFrame object as a value of the new column. Please refer to the code.

>>> df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))
>>> df.loc[df['F'] == 6, 'G'] = df_in_df
>>> df
   D  E  F   G
0  1  2  3 NaN
1  2  4  6 NaN
>>> df.loc[df['F'] == 6, 'G'].item()
    nan
>>> # But the below works fine, i.e. when I insert an integer
>>> df.loc[df['F'] == 6, 'G'] = 4
>>> df
>>>   D  E  F    G
   0  1  2  3  NaN
   1  2  4  6  4.0
>>> # and to verify 
>>> df.loc[df['F'] == 6, 'G'].item()
    4.0

BTW I have managed to find a workaround over this by pickling the DataFrame into a string but I don't feel any good about it:

df.loc[df['F'] == 6, 'G'] = pickle.dumps(df_in_df)
>>> df
187:    D  E  F                                                  G
     0  1  2  3                                                NaN
     1  2  4  6  ccopy_reg\n_reconstructor\np0\n(cpandas.core.f...

>>> revive_df_from_df = pickle.loads(df.loc[df['F'] == 6, 'G'].item())
>>> revive_df_from_df
191:     X   Y   Z
     0  11  13  17
     1  19  23  31

I started using pandas today itself after referring through pandas in 10 mins, So I don't know the conventions, Any better ideas ? Thanks!

Merlin
  • 24,552
  • 41
  • 131
  • 206
Devi Prasad Khatua
  • 1,185
  • 3
  • 11
  • 23

3 Answers3

1

Create a Dict first:

x = pd.DataFrame()

y =  {'a':[5,4,5],'b':[6,9,7], 'c':[7,3,x]}

# {'a': [5, 4, 5], 'b': [6, 9, 7], 'c': [7, 3, Empty DataFrame
#   Columns: []
#   Index: []]}

z = pd.DataFrame(y)

#   a  b                                      c
# 0  5  6                                      7
# 1  4  9                                      3
# 2  5  7  Empty DataFrame
# Columns: []
# Index: []
# In [ ]:

(or, convert the DataFrame to dict and try to insert it. There is a lot happening ,when pandas creates objects.. You are torturing pandas. Your use case implies nested dicts, I would use that. )

Merlin
  • 24,552
  • 41
  • 131
  • 206
1

You are on shaky ground relying on this behavior. pandas does a lot of work trying to infer what you mean or want when passing array like things to its constructors and assignment functions. This is pressing on those boundaries, seemingly intentionally.

It seems that direct assignment via loc doesn't work. This is a work around I've found. Again, I would not expect this behavior to be robust over pandas versions.

df = pd.DataFrame([[1,2,3],[2,4,6]], columns=list('DEF'))

df_in_df = pd.DataFrame([[11,13,17],[19, 23, 31]], columns=list('XYZ'))

df.loc[df['F'] == 6, 'G'] = np.nan
df.loc[df['F'] == 6, 'G'] = df.loc[df['F'] == 6, ['G']].applymap(lambda x: df_in_df)

df

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Why, is it kind of wrong by convention to insert a DF into another DF ? – Devi Prasad Khatua Jun 21 '16 at 17:47
  • Because on init pandas called Numpy -- created array.. now its getting sequence. @wolframalpha.. Your use case is not what pandas was designed for. – Merlin Jun 21 '16 at 17:52
  • 1
    I'm not an authority on the issue. But I'd say yes. Not wrong. But wrong by convention (I'm guessing what this means). The advantages pandas provides comes in many forms including its inference. Placing a general object inside a dataframe shouldn't be an issue. Expecting this code to continue functioning in future versions is. I'd guess the devs might very well change how this works in an attempt to better infer what people might mean when they attempt such a thing. Placing a high dimensional structure in a high dimensional structure is better handled with MultiIndex. – piRSquared Jun 21 '16 at 17:52
  • @piRSquared okay that's fine, but isn't it cool to map a single row into multiple rows of another DF (as in DBs we use junction tables), I am asking you this since then next time I won't be using this thing with pandas! – Devi Prasad Khatua Jun 21 '16 at 18:03
1

First create the column where you want to insert the dictionary. Then convert your dictionary to a string using the repr function. Then insert the string dictionary to your column. If you want to query that string. First select it and then use eval(dict) to convert to dictionary again and use.

Vincent Appiah
  • 101
  • 1
  • 1