I am working on a project to develop an accessible way to monitor dryland ecosystems using relatively high resolution satellite imagery (3 meters) with return periods of a few days. The features we wish to map are often ~1 meter in size, so we have chosen to try to downscale the 3 meter imagery to 1 meter and also utilize temporal variation in plant/soil community phenology to distinguish classes, using a 3D CNN on a spatiotemporal data cube.
The current architecture of 3D CNN only that we use works fairly well, achieving ~67% accuracy after 50 epochs, but there is high confusion between two types of classes - likely due to the coarse resolution of the 3 meter satellite imagery and somewhat similar phenology between classes. However, when training a 2D CNN using National Agriculture Imagery Program (NAIP) data, this 2D CNN distinguished between these classes well and achieves higher accuracies ~80%.
Despite achieving high accuracy using the NAIP images, using this imagery alone is not terribly applicable because NAIP is collected every few years and would limit the monitoring capabilities of dryland managers under this approach.
Therefore, is it possible to develop a 2D-3D CNN which combines both a 3D CNN with a downscaled spatiotemporal input with recent data that is also informed by a 2D CNN that has an input of maybe not so recent data that is at a higher resolution?
(We are using keras, and I am happy to share any files or code you may find useful)