I am training an ANN in some biological experimental data. Briefly, my input dataset (features) consists of gene levels (RNA expression levels) of different samples (cell lines). In this dataset, I have replicates of the same biological sample, meaning that I have measured twice (or more times) the RNA expression levels of the same cell line or cell lines that are meant to be the same. I have included all different measurements (different cell lines, different measurements of the same cell line etc.) as different samples in the training set in order to increase the flexibility of the ANN, instead of calculating the average and using only that (for the different measurements of the same cell line).
I was wondering whether I can use this average of different measurements of the same cell lines as my validation test - what do you think? It's a regression ANN and the labels are protein structures.