I am trying to anonymize an XML Export of confluence. I found their export cleanner jar:
https://confluence.atlassian.com/doc/content-anonymizer-for-data-backups-134795.html
I have modified the clean.stx
to remove all users like this:
<stx:template match="object[@class='ConfluenceUserImpl']/property[@name='name']/text() | object[@class='ConfluenceUserImpl']/property[@name='lowerName']/text() | object[@class='ConfluenceUserImpl']/id[@name='key']/text() | property[@class='ConfluenceUserImpl']/id[@name='key']/text()">
<stx:value-of select="translate(., '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')"/>
</stx:template>
I need to modify the CDATA also using regex or similar in order to remove user mentions in the body of a confluence page.
The CDATA looks like this e.g.:
<property name="body">
<![CDATA[
<p>
<ac:link>
<ri:user ri:userkey="8a8300716489cc7d016489ce009a0000" />
</ac:link>
</p>
]]>
</property>
Here I only need to replace the value of ri:userkey
to xxx or similar.
How can I do this?