0

I'm currently using SSIS to do an improvement on a project. need to insert single documents in a MongoDB collection of type Time Series. At some point I want to retrieve rows of data after going through a C# transformation script. I did this:

foreach (BsonDocument bson in listBson)
{
    OutputBuffer.AddRow();
    OutputBuffer.DatalineX = (string) bson.GetValue("data");
}

But this piece of code that works great with small file does not work with a 6 million line file. That is, there are no lines in the output. The other following tasks validate but react as if they had received nothing as input. Where could the problem come from?

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
Dexter
  • 21
  • 3

1 Answers1

1

Your OuputBuffer has DatalineX defined as a string, either DT_STR or DT_WSTR and a specific length. When you exceed that value, things go bad. In normal strings, you'd have a maximum length of 8k or 4k respectively.

Neither of which are useful for your use case of at least 6M characters. To handle that, you'll need to change your data type to DT_TEXT/DT_NTEXT Those data types do not require a length as they are "max" types. There are lots of things to be aware of when using the LOB types.

  • Performance can suck depending on whether SSIS can keep the data in memory (good) or has to write intermediate values to disk (bad)
  • You can't readily manipulate them in a data flow
  • You'll use a different syntax in a Script Component to work with them

e.g.

// TODO: convert to bytes
Output0Buffer.DatalineX.AddBlobData(bytes);

Longer example of questionable accuracy with regard to encoding the bytes that you get to solve at https://stackoverflow.com/a/74902194/181965

billinkc
  • 59,250
  • 9
  • 102
  • 159
  • In my case I go through lines in my files. In fact a BsonDocument represent only one line (with other metadata), and the field "data" is only 46 characters long. What I do is foreach line of my document that I converted into a BsonDocument (to add them in a MongoDb Collection), I try to return the field "data" in my OutputBuffer and that for all the line one after the other. – Dexter Feb 14 '23 at 08:55