0

Short description of our application: We analyze .NET assemblies and detect dependencies between them (e.g. method calls). We save those dependencies in a MSSQL server database. From a class/method in code we can now find all direct and indirect dependencies and are able to find out which code may break if we change the interface or implementation.

Although we make good use of indices (dropped our import performance, but that runs overnight anyways) we still have performance issues. As we import many many versions of the same assembly we have quite a heavy amount of data and queries take a few seconds, which is just not fast enough (< 1.5s is the target).

As dependencies are a graph-like structure we're wondering if switching from MSSQL to a NoSQL graph database may help. This would take some time so we're hoping for some external input first.

If yes, you can of course also post a recommended .NET graph database :-)

D.R.
  • 20,268
  • 21
  • 102
  • 205
  • What sort of scale are you working at? Is it millions of rows? Billions? Is the hardware adequate? Are you sure there's no more optimisation to be done? Please update your question with some more info like this and it might help somebody to advise you better. – Tom Chantler Aug 29 '12 at 09:14
  • Currently we have about 1.5 million entries for testing purposes in the important dependency table, however, we think in productive mode we have to deal with about 10 million entries. Hardware is a 4-core Intel Core2, enough RAM, normal HDD (unfortunately no SSD). – D.R. Aug 29 '12 at 09:18
  • Dependency table: TYPE, PART_REFERENCED, PART_DEPENDENT Dependency part table: ASSEMBLY_ID, TYPE_ID, MEMBER_ID, SRC_FILE, SRC_LINE – D.R. Aug 29 '12 at 09:22
  • Background information: We can easily do the direct dependency query in time, however, we have to ask the database recursively for indirect dependencies which then takes the time... – D.R. Aug 29 '12 at 09:26
  • How about reading the data from database to an in-memory graph at start up and querying that one? 10 million entries doesn't sound that much. – Patrik Svensson Aug 29 '12 at 10:54
  • The problem with the in-memory graph is our daily import of nightly builds. We'd have to rebuild this every day. It's the same reason why caching doesn't help us out so much. – D.R. Aug 29 '12 at 11:27

1 Answers1

0

Call me an old fogey, but I would be quite careful making such a technology switch - as this SO question shows, the technology choice is fairly limited, and I think you run the risk of turning your project into a "Neo4J" project, rather than a "dependency management" project. If you've really hit the buffers, that's worth considering, but it doesn't sound like you should be there with the data volumes you're discussing.

The first thing I'd consider is looking at the "nested set" model - this specifically solves the performance problem when retrieving all children for a given node.

Community
  • 1
  • 1
Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52