Improving Accuracy of Downscaled Spatio-temporal 3D CNN with addition of 2D CNN - is it possible? (Instance Segmentation)

Question

I am working on a project to develop an accessible way to monitor dryland ecosystems using relatively high resolution satellite imagery (3 meters) with return periods of a few days. The features we wish to map are often ~1 meter in size, so we have chosen to try to downscale the 3 meter imagery to 1 meter and also utilize temporal variation in plant/soil community phenology to distinguish classes, using a 3D CNN on a spatiotemporal data cube.

The current architecture of 3D CNN only that we use works fairly well, achieving ~67% accuracy after 50 epochs, but there is high confusion between two types of classes - likely due to the coarse resolution of the 3 meter satellite imagery and somewhat similar phenology between classes. However, when training a 2D CNN using National Agriculture Imagery Program (NAIP) data, this 2D CNN distinguished between these classes well and achieves higher accuracies ~80%.

Despite achieving high accuracy using the NAIP images, using this imagery alone is not terribly applicable because NAIP is collected every few years and would limit the monitoring capabilities of dryland managers under this approach.

Therefore, is it possible to develop a 2D-3D CNN which combines both a 3D CNN with a downscaled spatiotemporal input with recent data that is also informed by a 2D CNN that has an input of maybe not so recent data that is at a higher resolution?

(We are using keras, and I am happy to share any files or code you may find useful)

Why don't you use 2D ConvNet as your first approach? It seems that you could represent 3D data such that they are interpreted as additional "filters" so that they would be amenable by a 3D ConvNet. — penkovsky, Aug 14 '23 at 16:51
Second, is your question more about how to use complementary information from both datasets to improve the accuracy? — penkovsky, Aug 14 '23 at 16:51
Third, I am also confused by the terminology. If you start with a 3-meter resolution, downscaling would mean that you get worse resolution (e.g. 9 meters), not better resolution (1 meter). Also you may want to provide details how do you perform downscaling operation. Do you apply any low-pass filters? — penkovsky, Aug 14 '23 at 16:55

Improving Accuracy of Downscaled Spatio-temporal 3D CNN with addition of 2D CNN - is it possible? (Instance Segmentation)

0 Answers0