I have a 3D dataframe with x and y and time as 3rd dimension. The data are 5 indizes of satellite images that were taken at different times. The x and y describes every pixel.
x y time SIPI classif
7.620001 -77.849990 2018-04-07 1.011107 2.0
2018-10-14 1.023407 2.0
2018-12-28 0.045107 3.0
2020-01-10 0.351107 2.0
2018-06-29 0.351107 2.0
-77.849899 2018-04-07 1.010777 8.0
2018-10-14 0.510562 2.0
2018-12-28 1.410766 4.0
2020-01-10 1.010666 8.0
2018-06-29 2.057068 8.0
-77.849809 2018-04-07 0.986991 1.0
2018-10-14 0.986991 8.0
2018-12-28 0.986991 5.0
2020-01-10 0.984791 5.0
2018-06-29 0.986991 3.0
-77.849718 2018-04-07 0.975965 10.0
2018-10-14 0.964765 7.0
2018-12-28 0.975965 10.0
2020-01-10 0.975965 10.0
2018-06-29 0.975965 3.0
-77.849627 2018-04-07 1.957747 2.0
2018-10-14 0.132445 6.0
2018-12-28 0.589677 2.0
2020-01-10 1.982445 2.0
2018-06-29 3.334456 7.0
I need to group the data and as new column I need the value from column 'classif_rf', which is most frequent in 5 datasets. The values are integers between 1 and 10. I want to add an condition which add only frequency higher than 3.
x y classif
7.620001 -77.849990 2.0
-77.849899 8.0
-77.849809 Na
-77.849718 10.0
-77.849627 2.0
So as a result I need dataframe where each pixel has a value with highest frequency and when the frequency is lower than 3 there should be a NA value.
Can the pandas.groupby function do that? I thought about value_counts(), but I'm not sure how to implement that to my dataset.
Thank you in advance!