I have a dataset like this
ID var value
9442000 a 2.01
9442000 v 2.2
9442000 h 5.3
9442000 f 0.2
9442000 s 0.55
9442000 t 0.6
952001 d 0.22
952001 g 0.44
952001 g 0.44
952001 h 0.77
652115 a 4.66
652115 d 1.55
652115 s 2.55
652115 s 2.55
I want to separate this into two dataframes for calibration (75%) and validation (25%). Doing it for overall is easy, but I want to do it ID-wise. So basically, I want to ensure that 75% of EACH ID goes to calibration. For example, for ID 9442000, I want to put any four events (random) into calibration and 2 into validation dataframe.
Expected output:
*Calibration*
ID var value
9442000 a 2.01
9442000 v 2.2
9442000 h 5.3
9442000 f 0.2
952001 d 0.22
952001 g 0.44
952001 g 0.44
652115 a 4.66
652115 d 1.55
652115 s 2.55
And
*validation*
ID var value
9442000 s 0.55
9442000 t 0.6
952001 h 0.77
652115 s 2.55