3

I have a situation which involves several threads simultaneously populating a database with data scraped from web sources. The scrapers are to be run periodically to collect new data from various sources.

I am new to using NHibernate and not entirely sure how best to manage sessions.

An example of what each worker does:

  • Scrape an entity A from a web source
  • Scrape each entity B related to A, and record that A has another B (i.e. an A has many B, a B has one A)

To persist each B, the session needs a reference to A to create the B with and then A needs to add B to its list of children. Both A and B are then persisted.

There is a hierarchy of this kind of A-B child-parent situation such that A has many B, B has many C... At the leaf level, A has thousands of leaf children, so it is impractical(?) to keep the session open all the way down this chain.

An alternative is recording the identifier of each parent down the chain (which can be stored independently of the session) and loading in the parent via this id each time a child needs to be created.

I also understand that an ISession is meant to be single-threaded, so I will need at the very least one session per thread, but beyond this I'm really not sure of the best approach?

Any ideas appreciated, bit confused at the moment!

Liam Williams
  • 676
  • 9
  • 16

1 Answers1

0

create a session per thread and use session.Load<>() to maintain the association without the need to load the objects everytime.

var data = GetDataForBs();

using (var session = OpenSession())
using (var tx = session.BeginTransaction())
{
    foreach (var item in data)
    {
        B = ... // create B
        B.A = session.Load<A>(data.A_Id); // Creates a proxy without loading A to maintain the association
        session.save(B);
    }
    tx.Commit();
}

If each entity lives on its own (dont need Cascading) you can use StatelessSession instead to speed things up

Firo
  • 30,626
  • 4
  • 55
  • 94
  • A StatelessSession would be appropriate in my situation, thanks for the suggestion. Also, how often is a good idea to be beginning and committing transactions, for example should I be committing per each individual insert or grouping multiple inserts? – Liam Williams Dec 09 '11 at 01:32
  • if you need speed then grouping is better, if you dont want to lose some inserts in case a commit throws (eg you cant repeat) then commit per insert – Firo Dec 09 '11 at 08:02