I want to use R to sample my dataframe. My data is timestamped epidemiological data, and I want to randomly sample at least 1 and as many as 10 records for each year, preferably in a manner that is scaled to the number of records for each year. I would like to export the results as a csv.
here are a few lines of my dataset, where I've left off the long genetic sequence field for each record.
year matrix USD clade
1958 W mG018U UP
1958 W mG018U UP
1958 W mG018U UP
1966 UN mG140L LL
1969 UN mG207L LL
1969 UN mG013L LL
1971 UN mG208L LL
1972 HA mG129M MN
1973 C1 mG018U UP
1973 NA mG001U UC
1973 NA mG001U UC
all I've learned to do is
sample(mydata, size = 600, replace = FALSE)
which doesn't of course take the year into account.