4

I have process, which first, generates lots of data which is save into mongoDB collection, then data is analyzed, and last - I want to save the whole collection to file on disk, and erase the collection. I know I could do it easily with MongoDump.exe, but I was wondering is there any way to do it directly from c#? - I mean not running console precess with - but using some functionality that is inside MOngo C# driver.

And, if it can be done - how would I do the reverse operation in c# ? - namely: loading .bson file into collection?

Mike
  • 931
  • 1
  • 8
  • 15

2 Answers2

5

Here's two methods that you can use to accomplish this:

public static async Task WriteCollectionToFile(IMongoDatabase database, string collectionName, string fileName)
{
    var collection = database.GetCollection<RawBsonDocument>(collectionName);

    // Make sure the file is empty before we start writing to it
    File.WriteAllText(fileName, string.Empty);

    using (var cursor = await collection.FindAsync(new BsonDocument()))
    {
        while (await cursor.MoveNextAsync())
        {
            var batch = cursor.Current;
            foreach (var document in batch)
            {
                File.AppendAllLines(fileName, new[] { document.ToString() });
            }
        }
    }
}

public static async Task LoadCollectionFromFile(IMongoDatabase database, string collectionName, string fileName)
{
    using (FileStream fs = File.Open(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
    using (BufferedStream bs = new BufferedStream(fs))
    using (StreamReader sr = new StreamReader(bs))
    {
        var collection = database.GetCollection<BsonDocument>(collectionName);

        string line;
        while ((line = sr.ReadLine()) != null)
        {
            await collection.InsertOneAsync(BsonDocument.Parse(line));
        }
    }
}

And here's an example of how you would use them:

// Obviously you'll need to change all these values to your environment
var connectionString = "mongodb://localhost:27017";
var database = new MongoClient(connectionString).GetDatabase("database");
var fileName = @"C:\mongo_output.txt";
var collectionName = "collection name";

// This will save all of the documents in the file you specified
WriteCollectionToFile(database, collectionName, fileName).Wait();

// This will drop all of the documents in the collection
Task.Factory.StartNew(() => database.GetCollection(collectionName).DeleteManyAsync(new BsonDocument())).Wait();

// This will restore all the documents from the file you specified
LoadCollectionFromFile(database, collectionName, fileName).Wait();

Note that this code was written using version 2.0 of the MongoDB C# driver, which you can obtain via Nuget. Also, the file reading code in the LoadCollectionFromFile method was obtained from this answer.

Community
  • 1
  • 1
Donut
  • 110,061
  • 20
  • 134
  • 146
  • Thanks , I will try that. There lot of stuff in your answer that I still need to learn - Tasks, async, etc, but i'll get there eventually. – Mike Aug 07 '15 at 15:06
  • In the mean time - another question: Will 'File.WriteAllText' and 'File.AppendAllLines' work with **really big** files? like couple of GB? Will it have to hold the whole content of the file in memory, or will it be written sequentially? – Mike Aug 07 '15 at 15:10
  • @Mike The `File.WriteAllText` call is just there to make sure the file is empty when you start. `File.AppendAllLines` should work fine even with really large files, since you're only appending one document at a time. However, `File.ReadAllLines` in the other function might give you some trouble. I'm going to update it to be more performant... – Donut Aug 07 '15 at 15:23
  • Also, since each document is being written to a new line in the file, and then this file is being read one line at a time, the newline character is effectively being used as a delimiter. This might cause issues if your BSON contains newline characters as well -- YMMV, you might want to experiment with different delimiters. – Donut Aug 07 '15 at 15:26
1

You can use C# BinaryFormatter to serialize object graph to disk. Also you can deserialize back to object graph.

Serialize: https://msdn.microsoft.com/en-us/library/c5sbs8z9%28v=VS.110%29.aspx

Deserialize: https://msdn.microsoft.com/en-us/library/b85344hz%28v=vs.110%29.aspx

However that is not mongodb or C# driver feature.

After serializing you can use the driver to drop the collection. And after deserializing you can use the driver to insert objects into a new collection.

Based on your rules, you may want to do some locking on that collection at the time you are doing the export process before you drop it.

AHMED EL-HAROUNY
  • 836
  • 6
  • 17
  • Thanks Ahmed. The main reason why I'm trying to store the data in the database - is that amount of data is too big and i'm getting out of memory exception. To give you some idea: I'm processing ~10 million instances of class , each of which contains dozen properties (doubles , strings, etc). Test has to calculate various statistics on that data. My idea was that if I store the data in the MongoDB, rather than keep in in the RAM - i could still work with data in a LINQ-like manner (which Mongo allows) - and this would take more time, – Mike Aug 07 '15 at 14:58
  • because Monogo would have to load needed parts off the hard disk, but at least i will not get out of memroy. Does Binary serialization, and desiarization allows me to work with the data , ** when only part of the data is in memory at a given moment** ? – Mike Aug 07 '15 at 15:02
  • 1
    I'm not quite sure on why you would want to do that specially with this amount of data. However you can load patches of collection documents and serialize them to multiple files on disk. Based on your available memory you decide how much document you load at a time. When deserializing you would loop through your ".dat" files for example process them one by one to move data back to Mongo. – AHMED EL-HAROUNY Aug 10 '15 at 23:41