0

I want to export a CSV table from the items of an IMongoCollection from MongoDB.Driver using C#.

How would I be able to do this efficiently? I was thinking of doing this by retrieving the documents from the collection and either convert them to a JSON-like format or use a StringBuilder to create the CSV file using and array of PropertyInfo to access the fields of the retrieved object.

Can someone come with an example of how I would be able to do this?

Woksin
  • 74
  • 2
  • 9
  • IMongoCollection should really have an export method.. – Woksin Feb 20 '18 at 16:53
  • What kind of elements are actually _in_ the collection? It should be a simple matter of getting column names and using them as CSV header, and column rows and writing them as additional CSV lines, matching their respective header column. – Nyerguds Feb 20 '18 at 17:09
  • @Nyerguds The elements are objects of a class with various properties of simple types. The goal is to have the user be able to choose which fields he wants to be included in the CSV. – Woksin Feb 20 '18 at 17:12
  • Well, if you can somehow get the column names out, I'd say, as you suggested yourself, start by using PropertyInfo to get the values out for each property, and for each object, make an array with those properties. You can end up with an easy `List` to top with the column names and convert to CSV. But, do try that yourself, and post some code. – Nyerguds Feb 20 '18 at 17:29

1 Answers1

3

Seems like the obvious way is to get all header data somehow (see further below), and then iterate through the collection and if you were to write by hand (which people don't encourage), string build, writing to file in batches (if your collection were quite large).

HashSet<string> fields = new HashSet<string>();
BsonDocument query = BsonDocument.Parse(filter);
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());

// Populate fields with all unique fields, see below for examples how.

var csv = new StringBuilder();
string headerLine = string.Join(",", fields);
csv.AppendLine(headerLine);

foreach (var element in result.ToListAsync().Result)
{
    string line = null;
    foreach (var field in fields)
    {
        BsonValue value;
        if (field.Contains("."))
        {
            value = GetNestedField(element, field);
        }
        else
        {
            value = element.GetElement(field).Value;
        }

        // Example deserialize to string
        switch (value.BsonType)
        {
            case BsonType.ObjectId:
                line = line + value.ToString();
                break;
            case BsonType.String:
                line = line + value.ToString();
                break;
            case BsonType.Int32:
                line = line + value.AsInt32.ToString();
                break;
        }
        line = line + ",";
    }
    csv.AppendLine(line);
}
File.WriteAllText("D:\\temp.csv", csv.ToString());

In the case of your own objects you'd have to use your own deserializer.

HOWEVER I'd recommend using the mongoexport tool if you can. You could simply run the exe from your application, feeding in arguments as required. Keep in mind though, that it requires explicit fields.

ProcessStartInfo startInfo = new ProcessStartInfo();
startInfo.FileName = "C:\mongodb\bin\mongoexport.exe";
startInfo.Arguments = "-d testDB -c testCollection --type csv --fields name,address.street,address.zipCode --out .\output.csv";
startInfo.UseShellExecute = false;

Process exportProcess= new Process();
exportProcess.StartInfo = startInfo;

exportProcess.Start();
exportProcess.WaitForExit();

More on mongoexport such as paging, additional queries and field file: https://docs.mongodb.com/manual/reference/program/mongoexport/

Getting Unique Field Names

In order to find ALL field names you could do this a number of ways. Using BsonDocument as a generic data example.

  1. Recursively traverse through your IMongoCollection results. This is going to have to be through the entire collection, so performance may not be great.

Example:

HashSet<string> fields = new HashSet<string>();
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
var result = database.GetCollection<BsonDocument>(collection).Find(new BsonDocument());
foreach (var element in result.ToListAsync().Result)
{
    ProcessTree(fields, element, "");
}

private void ProcessTree(HashSet<string> fields, BsonDocument tree, string parentField)
{
    foreach (var field in tree)
    {
        string fieldName = field.Name;
        if (parentField != "")
        {
                fieldName = parentField + "." + fieldName;
        }

        if (field.Value.IsBsonDocument)
        {
            ProcessTree(fields, field.Value.ToBsonDocument(), fieldName);
        }
        else
        {
            fields.Add(fieldName);
        }
    }

}

  1. Perform a MapReduce operation to return all fields. Scanning nested fields becomes more complex with this method however. See this.

Example:

string map = @"function() { 
    for (var key in this) { emit(key, null); }
}";
string reduce = @"function(key, stuff) { return null; }";
string finalize = @"function(key, value){
    return key;
}";
MapReduceOptions<BsonDocument, BsonValue> options = new MapReduceOptions<BsonDocument, BsonValue>();
options.Finalize = new BsonJavaScript(finalize);

var results = database.GetCollection<BsonDocument>(collection).MapReduceAsync(
    new BsonJavaScript(map),
    new BsonJavaScript(reduce),
    options).Result.ToListAsync().Result;
foreach (BsonValue result in results.Select(item => item["_id"]))
{
    Debug.WriteLine(result.AsString);
}
  1. Perform an Aggregation operation. You'd need to unwind as many times as required to get all nested fields.

Example:

string[] pipeline = new string[3];
pipeline[0] = "{ '$project':{ 'arrayofkeyvalue':{ '$objectToArray':'$$ROOT'}}}";
pipeline[1] = "{ '$unwind':'$arrayofkeyvalue'}";
pipeline[2] = "{ '$group':{'_id':null,'fieldKeys':{'$addToSet':'$arrayofkeyvalue.k'}}}";
var stages = pipeline.Select(s => BsonDocument.Parse(s)).ToList();
var result = await database.GetCollection<BsonDocument>(collection).AggregateAsync<BsonDocument>(stages);
foreach (BsonValue fieldName in result.Single().GetElement("fieldKeys").Value.AsBsonArray)
{
    Debug.WriteLine(fieldName.AsString);
}

Nothing perfect here and I couldn't tell you which is the most efficient but hopefully something to help.

gwarn
  • 136
  • 1
  • 7