in these days I'm totally struggling myself trying to let sas read an xfdf file, an export of comments (annotation) in a pdf with adobe professional. If you never worked with an .xfdf file, don't worry, basically is an XML parent format of adobe.
I can't use SAS XML Mapper, for two reason: first one is that I can't use it on workplace (where I develop my personal projects too, like this), second one is that I'd like to write a procedure that could be always repeated (without mapping anytime).
Usually comments are collected in xfdf with this format:
><freetext rect="300.165985,66.879105,380.165985,86.879105" creationdate="D:-001-1-1-1-1-1-00'30'" name="a7311cdb-77b3-4a48-8eff-62364f94213d" color="#FFBF00" flags="print" date="D:20150730153125+01'00'" page="0"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:11.0pt;text-align:left;color:#FF0000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p
>THE_COMMENT_TO_EXPORT_IS_THIS_STRING</p
></body
></contents-richtext
></freetext
And I gather that data with this portion of xml map:
<COLUMN name='var1'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>
Sometimes comment are collected in another way:
><freetext rect="331.041992,230.949005,553.198975,250.949005" creationdate="D:-001-1-1-1-1-1-00'30'" name="4f112387-dec6-42f1-ad8c-a1fecf9d8e04" color="#66CCFF" flags="print" date="D:20150730153213+01'00'" page="0"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:11.0pt;text-align:left;color:#FF0000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p dir="ltr"
><span style="font-family:Arial"
>THE_COMMENT_TO_EXPORT_IS_THIS_STRING</span
></p
></body
></contents-richtext
></freetext
No problem also here, I can gather this comment with this xml map portion:
<COLUMN name='var2'>
<PATH syntax='XPath'>/xfdf/annots/freetext/contents-richtext/body/p/span</PATH>
<TYPE>character</TYPE>
<DATATYPE>string</DATATYPE>
<LENGTH>60</LENGTH>
</COLUMN>
But here comes the problem, sometimes the data is collected in this strange format, with a double span tag:
><freetext rect="9.623672,760.177979,210.281006,783.448975" creationdate="D:00000000000000Z" name="4f037e18-9143-4ec1-a6ae-249fa2215528" width="2" color="#66CCFF" flags="print" date="D:20150731152640+01'00'" page="53"
><contents-richtext
><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:8.0.0" xfa:spec="2.0.2" style="font-size:14.0pt;text-align:left;color:#000000;font-weight:normal;font-style:normal;font-family:Arial,sans-serif;font-stretch:normal"
><p dir="ltr"
><span style="font-family:Arial"
>THIS_IS_THE_FIRST_PART </span
><span style="font-family:Arial"
>THIS_IS_THE_SECOND_PART</span
></p
></body
></contents-richtext
></freetext
The second map code hits only the second string (here: THIS_IS_THE_SECOND_PART), can someone please help? How to write an appropriate map for gathering both the informations with sas?
PS: I'm pretty sure that alse SAS XML Mapper can't solve this issue, I found someone with the same problem on the web and using a map created by that tool.
PS2: Path type is xpath 1.0, I gave I try with string-join and I had this error:
ERROR: invalid character in Xpath expression
ERROR: Xpath construct string-join(/xfdf/annots/freetext/contents-richtext/body/p/span, '')
for column var2 is an invalid, unrecognized, or unsupported form
EDIT: Added HTML tag, <P>
and <SPAN>
are tags related to this language.