The dataset that I have is separated on different files grouped on samples that know each other, i.e., they were created on similar conditions on a similar time. The balance of the train-test dataset is important so the samples have to be on train or test, but cannot be separated. So KFold it is not simple to use on my scikit-learn code.
Right now, I am using something similar to LOO making something like:
train ~> cat ./dataset/!(1.txt)
test ~> cat ./dataset/1.txt
Which is not confortable and not very useful if I want to make folds on test of several files and make a "real" CV. How would be possible to make a good CV to check real overfitting?