The only time I've ever used it in code that wasn't specifically to compare memory usage of two or more different approaches to something had the following scenario:
In a web application (and hence, long-running), there were a few very large collections that would generally be re-built at most a few times a day, often much less frequently.
Several objects within that collection would be equivalent, and hence a lot of memory could be saved by replacing references to one such object to a reference to the other (after building the collection was read-only in use, and hence the aliasing involved was safe). So first the collection would be built, but then it would be reduced in size, killing many objects.
This meant that there was a sudden spike in the number of objects destroyed per second, that would then not happen again for several hours at least. As such the GC would not correctly judge the amount of collection needed, and that memory was about to be needed to build the second large collection. Hence doing a manual collection every thousand of clean-up operations did have a positive effect on performance (enough to sometimes go from the application crashing on the third collection, to it being dependable).
A lot of measurement was done to make sure it was indeed beneficial in this case.
The two factors that made this beneficial were:
- A lot of object deaths happened.
- The event causing this was rare in the lifetime of the application.
Without both of those being true, the manual call would be needless. Without a lot of object deaths, there's no point, and if it wasn't rare in the application's lifetime, the GC would have self-tuned to cope.