1

I have a complex scenario, in which a customer sends XML files and I am supposed to extract some info out of these files. One important info is images encoded in base64.

The xsd file of the xml, defines the element which contains the base64-encoded image as follows:

   <xs:element name="image" type="xs:base64Binary" nillable="false" minOccurs="0" maxOccurs="unbounded">
      <xs:annotation>
        <xs:documentation xml:lang="en">
          base64 image.
        </xs:documentation>
      </xs:annotation>
    </xs:element> 

In SSIS, I am using XML Source component to extract the element. In the "Input and Output Properties" of the component, I defined the datatype (in both External/Output Columns related to the base64-image element) as image [DT_IMAGE] type.

Q1: Is it correct to choose this datatype to read base64-encoded images? what about byte stream [DT_BYTES] or just string [DT_STR] ?

Next, I passed the output of the component to a Recordset Destination to store the output in an SSIS variable User::xml_image of type Object (I did this because it is possible to have multiple images).

The object variable is then passed to a Script Task, converted into DataTable, and I am trying to save "let's say the first row/image" into the file system. The code I am using is as:

            DataTable table = new DataTable();
            oleAdapter.Fill(table, Dts.Variables["xml_image"].Value);

            Image image;
            BinaryFormatter bf = new BinaryFormatter();
            var ms = new MemoryStream();

            // convert the first image
            bf.Serialize(ms, table.Rows[0]["image"]);
            byte[] bytes = ms.ToArray();

            using (MemoryStream mem = new MemoryStream(bytes))
            {
                mem.Seek(0, SeekOrigin.Begin);
                image = Image.FromStream(mem); // error in this line: Parameter is not valid.
                image.Save("D:\\tepack.jpeg", ImageFormat.Jpeg);
            }

The script executes until call of FromStream method and fails with the this error:

Parameter is not valid.

Q2: How to fix the code error, and save the image successfully, considering above scenario and configurations?

khidir sanosi
  • 161
  • 11

1 Answers1

1

A1. DT_IMAGE vs DT_BYTES VS DT_STR

A base64 encoded binary is, by definition, ASCII string data so the initial extract of data would be DT_STRmaybe.

At that point, you have the characters from XML that you can visually inspect to confirm Yup, I pulled that field correctly out of XML.

The next step is to reverse, or decode, the base64 encoding to the original binary bits.

A2. Image.FromStream error

I'm not sure where the Image class stems from but it likely is trying to convert the data into a different image format (png to jpg to bmp, etc) but the bits you have in memory already are the bits you need. You just need to write that array of bytes to disk, something along the lines of

https://www.oreilly.com/library/view/c-cookbook/0596003390/ch02s12.html

byte[] imageBytes = Base64DecodeString(bmpAsString);
fstrm = new FileStream(@"C:\winnt_copy.bmp", FileMode.CreateNew, FileAccess.Write);
BinaryWriter writer = new BinaryWriter(fstrm);
writer.Write(imageBytes);
writer.Close( );
fstrm.Close( );

Maybe The link to data types on books online notes the maximum length of data types and we see a DT_STR is 8000 characters, while DT_IMAGE/DT_BYTES/DT_TEXT is 2.1M units (char or bytes). Since none of this is unicode data, it's all the same cost storage cost but none of the Expression language can operate on the data (except null and length check IIRC) and the data viewer is going to be lumped in there so a visual inspection might be challenging.

A jpg is a compressed file format - they threw away data to reduce the number of bytes. But Base64 is anti-compression as it inflates the size as part of mapping dangerous binary characters into safe ascii ones. So, unless you know your source images are very small, you're likely going to spill over that 8000 character boundary.

Assuming a 1.3 inflation rate, you're looking at a maximum original size in the ballpark 5970 bytes. A 6kb jpg file is likely small and of low quality. This image for example was 5151 bytes

A scene from the film Airplane! The automatic pilot, Otto, and Elaine the stewardess are in the cockpit (It's the little room in the front of the plane where the pilots sit, but that's not important right now) smoking

Final thoughts As I've typed this out and thought about the problem domain, unless you have a strong reason to use the data flow, I would avoid it in this instance. Instead, use a Script Task to shred the XML. I think you'll have fewer issues if you don't have to worry about fitting data types into SSIS primitive types as well as the fact that all of your operations will already be .NET script operations.

billinkc
  • 59,250
  • 9
  • 102
  • 159
  • Well my problem was not solved, however I took the approach that you suggested in the end, by processing the `XML` via a script task and that works :) – khidir sanosi Feb 04 '21 at 09:32