4

We're in the process of standardizing on UTF-8 encoding for all source files, to make it easier for developers using a plethora of tools (notably including IntelliJ IDEA on Windows, Mac and Linux) to handle Git merge conflicts without introducing unwanted encoding changes.

While Delphi 11 seems able to handle both UTF-8 and ANSI encoded PAS and DFM files well, and has a configuration setting (under Tools > Options > Editor) called "Default file encoding", which can be changed from its default setting of ANSI to UTF8, making all newly created PAS files be saved with UTF-8 encoding, this does not seem to affect DFM files.

DFM files seem to always get saved as ANSI. This seems to apply also to DFM files that originally were in UTF-8 encoding: when I edit them in Delphi and re-save, they get changed to ANSI.

Is this a feature or a bug? If it is a feature, could you point to some authoritative documentation stating that.

Matthias B
  • 404
  • 4
  • 11
  • 2
    I’m not sure how it matters. AFAIK Delphi does not use any character value outside ASCII range when writing textual DFM files — which makes it valid UTF-8. Are you saying your version of Delphi is doing something different? – Dúthomhas Jan 02 '23 at 15:45
  • Ah, that is a key insight - DFM files do not use any character values outside ASCII range, so a file without a BOM is both valid UTF-8 and valid ANSI at the same time. What I noticed (and got worried about - in hindsight unnecessarily) was that a DFM file encoded as UTF-8 with BOM lost its BOM when resaved from Delphi. (It would still be nice if the BOM were kept by Delphi when it exists - not to cause artificial changes in the version control history.) – Matthias B Jan 02 '23 at 15:48
  • 1
    Yes, UTF-8 files don’t technically need a BOM. Also, remember that users (used to) have the option to save DFM files in _binary_ format, which would be invalid UTF-8. You might need to distinguish between the two. – Dúthomhas Jan 02 '23 at 15:51
  • That is right. We've converted all binary DFMs to text DFMs, but that is indeed something to keep in mind. Thank you! – Matthias B Jan 02 '23 at 15:53

1 Answers1

6

DFM files use their own proprietary encoding (# followed by number of Unicode code point) to store non-ASCII characters in string values.

However, in newer versions of Delphi, DFM files in text form may be automatically stored using UTF-8 if identifiers (class, property or component names) contain non-ASCII characters.

From the documentation for Delphi 11 Alexandria:

Component streaming (Text DFM files):

  • Are fully backward-compatible.
  • Stream as UTF-8 only if component type, property, or name contains non-ASCII-7 characters.
  • String property values are still streamed in “#” escaped format.
  • May allow values as UTF-8 as well (open issue).
  • Only change in binary format is potential for UTF-8 data for component name, properties, and type name.
NineBerry
  • 26,306
  • 3
  • 62
  • 93
  • 1
    Most interesting (and confirmed by testing now) that it streams as UTF-8 (with BOM) if a component name contains non-ASCII-7 characters, but refrains from that even if string property values contain non-ASCII-7 characters (then it uses its own proprietary encoding, as you say). This answer clarified it all for me. Thank you! – Matthias B Jan 02 '23 at 16:21