2

I am using the below code to return a string from all TEXT items within a .dxf

    for i in m_space.query('TEXT'):
        return(str(i.dxf.text))

This is working well so I would like to do the same for all MTEXT items. From reading the docs I have put together the below;

    for i in m_space.query('MTEXT'):
        return(str(i.text))

But the output seems to include some additional data. I could use some regex to get the text I need but would like to know if there is a better way built into ezdxf

>>>   '{\\Fsimplex|c0;TEXT THAT I WANT}'
ajcnzd
  • 53
  • 4

1 Answers1

3

The additional information that you are seeing within the MText content is MText formatting codes.

When formatting overrides are applied through the MText editor (as opposed to being applied to the Text Style referenced by the MText object), the formatting is encoded using formatting codes embedded within the text content. Such formatting codes are not visible in AutoCAD, but are used to appropriately render the various sections of the text content enclosed by the code - in your case, the formatting code:

{\\Fsimplex|c0;TEXT THAT I WANT}

Results in the string TEXT THAT I WANT being displayed using the simplex font.

As far as I'm aware, does not include methods which will allow you obtain the text content with all formatting codes removed, but upon obtaining the content using the text property, you can then use Regular Expressions to remove such codes.

To offer an existing example, I've previously developed the following AutoLISP function which uses Regular Expressions to remove all formatting codes, but there are likely other ways to phrase the RegEx patterns and obtain the same result:

;; Quick Unformat  -  Lee Mac
;; Returns a string with all MText formatting codes removed.
;; rgx - [vla] Regular Expressions (RegExp) Object
;; str - [str] String to process

(defun LM:quickunformat ( rgx str )
    (if
        (null
            (vl-catch-all-error-p
                (setq str
                    (vl-catch-all-apply
                       '(lambda nil
                            (vlax-put-property rgx 'global     actrue)
                            (vlax-put-property rgx 'multiline  actrue)
                            (vlax-put-property rgx 'ignorecase acfalse) 
                            (foreach pair
                               '(
                                    ("\032"     . "\\\\\\\\")
                                    (" "        . "\\\\P|\\n|\\t")
                                    ("$1"       . "\\\\(\\\\[ACcFfHKkLlOopQTW])|\\\\[ACcFfHKkLlOopQTW][^\\\\;]*;|\\\\[ACcFfKkHLlOopQTW]")
                                    ("$1$2/$3"  . "([^\\\\])\\\\S([^;]*)[/#\\^]([^;]*);")
                                    ("$1$2"     . "\\\\(\\\\S)|[\\\\](})|}")
                                    ("$1"       . "[\\\\]({)|{")
                                    ("\\$1$2$3" . "(\\\\[ACcFfHKkLlOoPpQSTW])|({)|(})")
                                    ("\\\\"     . "\032")
                                )
                                (vlax-put-property rgx 'pattern (cdr pair))
                                (setq str (vlax-invoke rgx 'replace str (car pair)))
                            )
                        )
                    )
                )
            )
        )
        str
    )
)

For your sample text string, the above would return:

_$ (setq rgx (vlax-create-object "vbscript.regexp"))
#<VLA-OBJECT IRegExp2 00000000315de460>
_$ (LM:quickunformat rgx "{\\Fsimplex|c0;TEXT THAT I WANT}")
"TEXT THAT I WANT"
_$ (vlax-release-object rgx)
0
Lee Mac
  • 15,615
  • 6
  • 32
  • 80
  • Thankyou, I will go the regex route. Do you know what the "c0;" is for? I can't see anything in the Autodesk formatting page you linked that explains this. – ajcnzd Feb 27 '20 at 19:07
  • You're welcome. I think the `c0` was initially intended to encode a colour change as part of the font change, as you can also have a code such as `{\\fVerdana|b1|i1|c0|p34;abc}` which encodes font, bold, italic and paragraph formatting, but Autodesk seem to have decided to use the separate `\\C` code for colour instead. – Lee Mac Feb 27 '20 at 23:27
  • 1
    Look here: https://adndevblog.typepad.com/autocad/2017/09/dissecting-mtext-format-codes.html – Andrew Truckle Feb 28 '20 at 08:05
  • 1
    Ah! It's the codepage - many thanks for plugging this gap in my knowledge @AndrewTruckle. – Lee Mac Feb 28 '20 at 13:22
  • 1
    Added `plain_text()` method from dxfgrabber to ezdxf to return the MTEXT content without formatting codes, available in next release 0.11.1. – mozman Feb 28 '20 at 18:00