I have just short of 2 million XML documents sitting on 16gb of file system space. They are all valid and share a single DTD. They are all of roughly equal size (all generated by the same lab information system).
I'm looking for an easy way for a single user to query the whole 2M doc corpus. I'm not looking to expose this to the web or even multiple LAN users; however, I would like it be able to expose some query interface to my intranet. I'm flexible on the query language but I would like to be able to do ad hoc queries. I want it to be at least simi-performant and I'm willing to dedicate additional disk space as needed to accommodate indexes.
A workable solution has to be deplorable on a single quad core Linux box with 8gb of RAM, new hardware isn't an option.
I found e-Xist DB but it doesn't seem to have all that much in the way of activity and the demo site is down.