No, this isn't a concern if it's used with a proper compositing system. When used correctly immediate mode 2D rendering can give far better performance than any retained mode 3D system. Remember the original Windows drawing system - GDI - used immediate mode and worked on 1990s PCs without GPUs.
The key is to only re-render the portions of the target that are "dirty". Your compositing engine needs to track the positions and bounds of the visual elements and when they change, determine what all needs to be repainted by crawling the visual tree both up and down testing for overlap. Then you repaint the geometries that truly need repainting. Usually this is a small fraction of the screen. You can use ID2D1RenderTarget::PushAxisAlignedClip to help with this too, so for example you don't have to manually draw partial shapes when the full shape is partially covered by the non-dirty portion of the screen.
Bear in mind that this sort of efficiency is difficult if not impossible with 3D scenery. Every time that camera moves something has to recompute the coordinates of every point of every polygon, and re-render that entire frame, at 100+ FPS. Loaded textures are retained but as I understand it the ultimate render target is usually if not always repainted by the hardware for every frame.
Now consider a compositing system like WPF or DirectComposition. These create and retain bitmap surfaces - textures really - for every "real" visual in the tree, consuming VRAM in the process. Yes, the 2D render process for these surfaces only needs to run when the actual render parameters of the visual change, not during mere layout changes, so it's easier in that regard, but you're also consuming a lot more VRAM and still relying on the GPU to assemble all those layers into a scene. (Something like WPF then goes through all kinds of hoops to cache and recycle render surfaces, presumably to save RAM and render burden). But this only really offers a meaningful performance benefit if you don't have hardware-accelerated 2D rendering, which was the case when WPF was first introduced.
So yes, if you naively repaint your entire visual tree every time anything changes, performance will be awful, but if you use it properly performance will be excellent. Sure it's easier to use DComp and if you are willing to require your users to have a good GPU then it's a great option. But if you really want to go for the ultimate performance level and minimal hardware requirements then pure D2D, with its immediate mode, is really the way to go.