0

This is a rather specific follow up to this question on creating pandas dataframes when entries have different lengths.

I have a dataset where I have:

  1. general environmental variables that apply to the whole problem (e.g. avg precipitation)
  2. values at, say, specific depth (e.g. average amount of water at any depth after rainfall)

so my data looks like

d = dict{'depth': [1,2,3], 'var1',[.01,.009,.002],'globalvar',[2.5]}
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))

>>  
 depth  globalvar   var1
0   1   2.5     0.010
1   2   NaN     0.009
2   3   NaN     0.002

Is there a way to call globalvar, e.g. df.globalvar without calling df.globalvar[1]? Is there a more pythonic way to do this?

E. Case
  • 67
  • 1
  • 2
  • 11
  • df.globalvar will always return a series, so no matter how long it is you will always need to index it to retrieve the float 2.5. I'm not sure what your desired output actually is so I'm not sure how to help further. DataFrames are good for vector operations, so storing the global at each row may or may not help depending on what you are trying to achieve. I can provide instructions on how to make all the arrays the same length, but once again I'm not sure what you are actually trying to achieve. – James Steele May 31 '19 at 15:41
  • Hi James, thanks. I tend to have to call this value regularly for multiple different dfs, using arrays (e.g. var1s) and globalvars; maybe I'm being lazy, but it would be nice not to always index a globalvar. I'm now thinking that keeping it as a dictionary might actually be the best solution for this. – E. Case May 31 '19 at 16:41

1 Answers1

0

You can do with stack

df.stack().loc[pd.IndexSlice[:,'globalvar']]
Out[445]: 
0    2.5
dtype: float64

Or dropna

df.globalvar.dropna()
BENY
  • 317,841
  • 20
  • 164
  • 234