I have a dataframe fal_mean
where the multi-indices are 'lat'
, 'lon'
, and, 'yr'
. Below is just a snippet of the data, but what I need to do is to select data by the number of entries for the index 'yr' (years) for a given lat/lon combindation. For example, I would like to select only the data for each lat and lon combination where there are N or more entries (years of yearly data) such that len(fal_mean['yr'] >= N)
lat lon yr 101 102 103 104 105 109
35 -75 1978 410.63 410.63 1231.890 4516.950 0.00 0.00
1979 0.00 406.05 4060.460 6090.690 406.05 0.00
1980 1441.89 720.94 11535.100 12256.050 0.00 0.00
1982 121.54 0.00 850.790 0.000 0.00 0.00
1983 638.82 2874.68 37272.955 1916.455 0.00 638.82
' I can select data based on the lat and lon combinations individually for whatever column I need:
fal_mean.loc[(35, -75), '101']
Out[74]:
yr
1978 410.630000
1979 0.000000
1980 1441.890000
1982 121.540000
1983 638.820000
1984 5419.645000
1985 99.836667
1986 7113.757500
1987 0.000000
1988 0.000000
1997 408.410000
1998 251.690000
2001 208.040000
2002 0.000000
2004 0.000000
2005 0.000000
2006 0.000000
2007 0.000000
2009 0.000000
2010 0.000000
2011 0.000000
Name: 101, dtype: float64
and get the length of that selection:
len(fal_mean.loc[(35, -75), '101'])
Out[75]: 21
But I cannot figure out how to return a slice of the data frame using
len(fal_mean.loc[(35, -75), '101']) > N
something like:
fal_mean.loc[(len(fal_mean.loc[(35, -75), '101']) > 32)]
Except I'd need to pass all values for the lat and lon indices.
EDIT 1
Here is a dataframe containing the lengths of each combination (index=lat, columns=lon), so I just need to be able to call those combinations >N with some slice notation on the origanl dataframe 'fal_mean'
I created the dataframe using:
cv=np.arange(35, 45, 1)
cv2=np.arange(-75, -66, 1)
fal=pd.DataFrame(data=None, index=cv, columns=cv2)
for lat in cv:
for lon in cv2:
fal.loc[lat, lon]=len(fal_mean.xs((lat, lon), level=('lat', 'lon')))
-75 -74 -73 -72 -71 -70 -69 -68 -67
35 21 8 0 0 0 0 0 0 0
36 33 32 0 0 0 0 0 0 0
37 33 35 0 0 0 0 0 0 0
38 13 35 32 8 0 0 0 0 0
39 0 31 35 35 25 9 16 0 0
40 0 0 35 32 36 37 37 37 37
41 0 0 0 0 34 33 36 37 37
42 0 0 0 0 0 37 37 35 37
43 0 0 0 0 0 32 35 31 34
44 0 0 0 0 0 0 0 12 31
I'm hoping there is some clever slice notation I could use to do this. I have tried xs
and df.index.get_level_values
but my syntax is wrong.
e.g. fal_mean.loc[:, len(fal_mean.index.get_level_values('yr') >32)]
does not work.
If there isn't a way to do this in one line, I could call a boolean array on this matrix and hopefully use .loc
to call the 0 and 1 indices (lat, lon) for those combinations in the original dataframe, but I'm not sure yet how to go about that.
fal >=32
-75 -74 -73 -72 -71 -70 -69 -68 -67
35 False False False False False False False False False
36 True True False False False False False False False
37 True True False False False False False False False
38 False True True False False False False False False
39 False False True True False False False False False
40 False False True True True True True True True
41 False False False False True True True True True
42 False False False False False True True True True
43 False False False False False True True False True
44 False False False False False False False False False
EDIT 2: here are the first 50 rows and 4 columns, there are 28+ columns, but that is not important at this point. I would like to keep all columns at only the latitude and longitude combinations with at least (N) entries for year index - I'm thinking 32 may be sufficient. That would exclude data from lat/lon combinations (35, (all)), (36, -74), (38, -75), etc. using the dataframe above as a guide.
These data were generated by a groupby function grouped=nm.groupby(['lat', 'lon', 'yr'])
so if I could modify that call with some limiter for the resulting lengths of grouped.year that would work also. I tried df.groupby(['key1', 'key2', 'key3']).size()
and this gives the number of data points that were binned to each lat lon combination for each year.
101 102 103 104
lat lon yr
35 -75 1978 410.630000 410.630000 1231.890000 4516.950000
1979 0.000000 406.050000 4060.460000 6090.690000
1980 1441.890000 720.940000 11535.100000 12256.050000
1982 121.540000 0.000000 850.790000 0.000000
1983 638.820000 2874.680000 37272.955000 1916.455000
1984 5419.645000 516.155000 12778.280000 96.990000
1985 99.836667 0.000000 2819.083333 13977.286667
1986 7113.757500 3909.402500 10313.850000 2823.675000
1987 0.000000 708.610000 6841.430000 0.000000
1988 0.000000 0.000000 4711.750000 0.000000
1997 408.410000 204.210000 0.000000 0.000000
1998 251.690000 0.000000 13507.510000 0.000000
2001 208.040000 485.420000 17474.970000 0.000000
2002 0.000000 0.000000 2173.193333 0.000000
2004 0.000000 0.000000 0.000000 465.480000
2005 0.000000 0.000000 231.600000 0.000000
2006 0.000000 0.000000 1209.420000 0.000000
2007 0.000000 0.000000 0.000000 0.000000
2009 0.000000 0.000000 3561.485000 0.000000
2010 0.000000 0.000000 347.500000 0.000000
2011 0.000000 0.000000 4742.660000 0.000000
-74 1994 908.000000 113.500000 1589.000000 0.000000
1996 231.960000 0.000000 55207.410000 0.000000
1999 18414.830000 575.460000 142139.500000 575.460000
2000 0.000000 0.000000 19388.510000 842.980000
2003 4654.030000 0.000000 2694.440000 0.000000
2005 6229.700000 1245.940000 8098.600000 0.000000
2006 3294.950000 6243.060000 14740.550000 346.840000
2013 19512.960000 9420.050000 10261.130000 0.000000
36 -75 1977 426.498667 612.972667 5114.548667 6870.683333
1978 213.488571 1339.422857 7686.621429 12274.601429
1979 140.881333 536.751333 5403.046000 7349.630667
1980 3669.234667 552.227333 68924.586000 6117.865333
1981 24.875455 322.680909 6340.275455 22119.756364
1982 9261.615714 3890.987143 16226.872857 122.911429
1983 199.143333 478.403333 70658.200476 524.357143
1984 8193.242857 996.810714 43920.052143 3900.735714
1985 514.931364 23.260455 8051.927727 924.295909
1986 2884.836667 4156.231667 23677.898000 1393.353667
1987 2555.915417 2792.415000 25936.742917 833.030833
1988 1410.285000 1159.560000 7141.306667 0.000000
1993 6952.696667 9823.026667 64672.390000 5508.930000
1994 214.590000 0.000000 127691.851429 4720.532857
1995 0.000000 0.000000 3534.124000 0.000000
1996 0.000000 0.000000 63118.047500 0.000000
1997 175.766000 421.272000 23034.918000 527.300000
1998 105.652000 0.000000 2508.806000 0.000000
1999 0.000000 0.000000 1245.455000 0.000000
2000 0.000000 0.000000 10889.758000 0.000000
2001 948.245000 441.873750 27449.827500 138.930000