How I do I get a second sample from a dataset in Python without getting duplication from a first sample?

Question

I have a python dataset that I have managed to take a sample from and put in a second dataset. After that I will need to produce another sample from the original dataset but I do not want any of the first sample to come up again. Ideally this would need any flag would only be there for a year so it can then be sampled again after that time has elapsed.

Does this answer your question? [How to incrementally sample without replacement?](https://stackoverflow.com/questions/18921302/how-to-incrementally-sample-without-replacement) — mkrieger1, Nov 24 '21 at 10:50
I didn't understand the sentence "Ideally this would need any flag would only be there for a year". — mkrieger1, Nov 24 '21 at 10:50
Sorry I was not clear. I am going to be taking the sample from the dataset every few months and do not want duplication if the sample row has been taken in the last 12 months. After that time period the row could be selected again. — ChrisPitkin, Nov 24 '21 at 11:08
Please show how you are currently taking samples from the dataset. I don't quite understand how the dataset is represented and stored in your scenario. — mkrieger1, Nov 24 '21 at 11:09
Please provide enough code so others can better understand or reproduce the problem. — Community, Nov 28 '21 at 18:58

score 0 · Answer 1 · answered Nov 26 '21 at 22:03

Denote your original dataset with A. You generate a subset of A, denote it with B1. You can then create B2 from A_leftover = A \ B1, where \ denotes the set difference. You can then generate B3, B4, ... B12 from A_leftover, where Bi is generated from A_leftover = B(i-1).

If you want to put back B1 in the next year, A_leftover = A_leftover \ B12 U B1, and from this, you can generate the subset for B13 (or you can denote it with B1 as 13%12 = 1). So after 12, you can say you can generate Bi from A_leftover = A_leftover \ B(i-1) U B(i-11). Or you can use this formula from the very beginning, defining B(-i) = empty set for every i in [0,1,2,...,10].

How I do I get a second sample from a dataset in Python without getting duplication from a first sample?

1 Answers1