Applying the KFold Cross Validation on nested dictionary

Question

Input dictionary

new_dict1 = {'ABW':{'ABR':1,'BPR':1,'CBR':1,'DBR':0},'BCW':{'ABR':0,'BPR':0,'CBR':1,'DBR':0},
        'CBW':{'ABR':1,'BPR':1,'CBR':0,'DBR':0},'MCW':{'ABR':1,'BPR':1,'CBR':0,'DBR':1},
        'DBW':{'ABR':0,'BPR':0,'CBR':1,'DBR':0}}

Is there any way to apply 2Fold Cross-Validation on this data of nested dictionary? However, this below-mentioned link "https://stackoverflow.com/questions/45115964/separate-pandas-dataframe-using-sklearns-kfold" split the data into train, test. I want to split the data into train, test, and validation?

You added `pandas` tag on this data frame, so I think you know that your nested-dictionary can be converted to a Pandas dataframe straightly which can apply KFold as in https://stackoverflow.com/questions/45115964/separate-pandas-dataframe-using-sklearns-kfold — cao-nv, May 08 '22 at 02:50
@cao-nv, I have seen this answer, but it splits the data into train, and test split using KFold. however, I want to split the data into train, validation, and test split. — Noorulain Islam, May 08 '22 at 03:02
I see. In that case, you must separate 2 times. The first time is for train and test. Your test set must be kept fixed during the development process. The second time is for the final train and validation. — cao-nv, May 08 '22 at 03:05
@cao-nv, can you please explain, how can I split it into two times and can fix the test set. If I split the first time, it randomly split the data into two parts. — Noorulain Islam, May 08 '22 at 03:11
Let's say you have collected 100 data samples. Firstly, you split the data into 2 part of 90 and 10 samples for training and testing respectively. You MUST save the two sets separately. In the next time, you just read the train set of 90 samples, then split it to obtain 2 set for final train (80) and validation (10). — cao-nv, May 08 '22 at 04:23
@keramat yes. Labels are 1, 1 etc. The keys of outer dictionary are my rows index. While the keys of inner dictionary are my column index. — Noorulain Islam, May 08 '22 at 05:23

keramat · Accepted Answer · 2022-05-08T05:32:51.197

You can use something like this:

from sklearn.model_selection import KFold
df = pd.DataFrame(new_dict1)
kf = KFold(n_splits = 2, shuffle = True, random_state = 0)
inds = kf.split(df)
for train_val_index, test_index in inds:
    kf = KFold(n_splits = 2, shuffle = True, random_state = 0)
    inds2 = kf.split(train_val_index)
    for train_index, val_index in inds2:
        print(train_index, val_index, test_index)

Output:

[0] [1] [2 3]
[1] [0] [2 3]
[0] [1] [0 1]
[1] [0] [0 1]

Applying the KFold Cross Validation on nested dictionary

1 Answers1