0

CONCLUSION:

For some reason the flow wouldn't let me convert the incoming message to a BLOB by changing the Message Domain property of the Input Node so I added a Reset Content Descriptor node before the Compute Node with the code from the accepted answer. On the line that parses the XML and creates the XMLNSC Child for the message I was getting a 'CHARACTER:Invalid wire format received' error so I took that line out and added another Reset Content Descriptor node after the Compute Node instead. Now it parses and replaces the Unicode characters with spaces. So now it doesn't crash.

Here is the code for the added Compute Node:

CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
    DECLARE NonPrintable BLOB X'0001020304050607080B0C0E0F101112131415161718191A1B1C1D1E1F7F808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9FA0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBFC0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDFE0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF1F2F3F4F5F6F7F8F9FAFBFCFDFEFF';
    DECLARE Printable    BLOB X'20202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020';
    DECLARE Fixed        BLOB TRANSLATE(InputRoot.BLOB.BLOB, NonPrintable, Printable);
    SET OutputRoot           = InputRoot;
    SET OutputRoot.BLOB.BLOB = Fixed;
    RETURN TRUE;
END;

UPDATE:

The message is being parsed as XML using XMLNSC. Thought that would cause a problem, but it does not appear to be.

Now I'm using PHP. I've created a node to plug into the legacy flow. Here's the relevant code:

class fixIncompetence {
function evaluate ($output_assembly,$input_assembly) {
    $output_assembly->MRM = $input_assembly->MRM;
    $output_assembly->MQMD = $input_assembly->MQMD;
    $tmp =  htmlentities($input_assembly->MRM->VALUE_TO_FIX, ENT_HTML5|ENT_SUBSTITUTE,'UTF-8');
    if (!empty($tmp)) {
        $output_assembly->MRM->VALUE_TO_FIX = $tmp;
    }
    // Ensure there are no null MRM fields. MessageBroker is strict.
    foreach ($output_assembly->MRM as $key => $val) {
        if (empty($val)) {
            $output_assembly->MRM->$key = '';
        }
    }
}

}

Right now I'm getting a vague error about read only messages, but before that it wasn't working either.

Original Question:

For some reason I am unable to impress upon the senders of our MQ messages that smart quotes, endashes, emdashes, and such crash our XML parser.

I managed to make a working solution with SQL queries, but it wasted too many resources. Here's the last thing I tried, but it didn't work either:

  CREATE FUNCTION CLEAN(IN STR CHAR) RETURNS CHAR BEGIN
    SET STR = REPLACE('–',STR,'–');
    SET STR = REPLACE('—',STR,'—');
    SET STR = REPLACE('·',STR,'·');
    SET STR = REPLACE('“',STR,'“');
    SET STR = REPLACE('”',STR,'”');
    SET STR = REPLACE('‘',STR,'&lsqo;');
    SET STR = REPLACE('’',STR,'’');
    SET STR = REPLACE('•',STR,'•');
    SET STR = REPLACE('°',STR,'°');
    RETURN STR;
END;

As you can see I'm not very good at this. I have tried reading about various ESQL string functions without much success.

user1958756
  • 377
  • 1
  • 4
  • 17
  • You shouldn't have to do this, the sender has a responsibility to send sane XML. If they choose to send those characters, they must use the `utf-8` encoding and set the `CCSID` accordingly (1208). If they set the encoding to `iso-1` or `CCSID 819` then the parser will rightly reject those glyphs. – Stavr00 May 17 '16 at 20:07
  • While I agree wholeheartedly that doesn't appear to be an option. Emails to supervisors and co-workers go largely ignored. – user1958756 May 17 '16 at 20:56
  • How do you parse the message in Broker, in BLOB? – Attila Repasi May 18 '16 at 06:27
  • Unfortunately no amount of coding can fix incompetence. You will have to use a more powerful tool than ESQL in order to sanitize the XML stream before handing it to the parser. – Stavr00 May 18 '16 at 14:28
  • Sorry should have asked this in my question beloe. What kind of input node are you using? – TJA May 28 '16 at 06:58
  • MQInput node. Setting the message domain doesn't appear to have any effect. – user1958756 May 31 '16 at 16:43

1 Answers1

2

So in ESQL you can use the TRANSLATE function.

The following is a snippet I use to clean up a BLOB containing non-ASCII low hex values so that it then be cast into a usable character string.

You should be able to modify it to change your undesired characters into something more benign. Basically each hex value in NonPrintable gets translated into its positional equivalent in Printable, in this case always a full-stop i.e. x'2E' in ASCII. You'll need to make your BLOB's long enough to cover the desired range of hex values.

DECLARE NonPrintable BLOB X'000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F';
DECLARE Printable    BLOB X'2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E2E';
SET WorkBlob = TRANSLATE(WorkBlob, NonPrintable, Printable);

BTW if messages with invalid characters only come in every now and then I'd probably specify BLOB on the input node and then use something similar to the following to invoke the XMLNSC parser.

CREATE LASTCHILD OF OutputRoot DOMAIN 'XMLNSC'
       PARSE(InputRoot.BLOB.BLOB CCSID InputRoot.Properties.CodedCharSetId ENCODING InputRoot.Properties.Encoding);

With the exception terminal wired up you can then correct the BLOB's of any messages containing parser breaking invalid characters before attempting to reparse.

Finally my best wishes as I've had a number of battles over the years with being forced to correct invalid message content in the "Integration Layer" after all that's what it's meant to do.

TJA
  • 2,969
  • 2
  • 25
  • 32
  • 1
    Unfortunately it didn't like trying to parse the BLOB in ESQL, but that's another issue. You're not supposed to thank people, but this goes above and beyond just being helpful - you gave an answer that went outside the bounds of best practices. That's a bold move. – user1958756 May 26 '16 at 22:23
  • Can you provide a bit more detail on what you did. I've used the above to parse messages quite a few times and whilst I may have over done the tidying up after pasting it into SO I'm surprised it didn't work at all. – TJA May 26 '16 at 23:49