0

My immediate need is to do an accent-insensitive comparison in MS Access. I am using Office 365 Access.

This is not strictly speaking a Unicode question as the European accented characters are present in all of Windows-1252 (sometimes misleadingly called "ANSI" in Microsoft products and documentation), "modern" Unicode and UCS-2.

The Access "Data Types" page I found mentioned "two bytes per character", which makes it sound like UCS-2, but with no details. Similarly, the "sort order" drop-downs list a number of values that are also undocumented.

Actual example: compare "Dvorak" to "Dvořák". These are not equal in MS Access.

It is NOT my goal today to find a work-around (I can do that myself) - it is to better understand MS Access capabilities in 2023.

Having gone through the incremental support improvements for SQL Server and .NET strings, my first thought was "surely MS Access can handle collations by now (2023)".

My bottom line questions are: "exactly" what encodings ("sort orders") is Office 365 Access supporting in its most recent releases, and is VBA using the same character set, or will working with accented characters in VBA experience translations or issues when being used within MS Access?

  • Since you've already checked the docs, and are not looking for workarounds, what exactly do you expect us to do? – Erik A Feb 13 '23 at 18:16
  • See my final paragraph - with knowledge about how MS Access works I can make better decisions about how to proceed. Perhaps one of the existing sort orders would help, and if I need to write a VBA function, knowing which character set is supported by Access VBA would guide my efforts. Failing that, I'm faced with basically a trial-and-error process. – Wayne Erfling Feb 15 '23 at 05:32

1 Answers1

1

You're not giving me a whole lot to go on, so I'll just go over the basics. It's important to note new features rarely make it to VBA and Access, and breaking changes are extremely rare, in contrast to new versions SQL server or C#.

Regarding charsets and encodings (how strings are stored):

Strings in tables, queries and application objects are stored in UTF-16. They may be compressed (unicode compression option for text fields). This is independent of sort orders.

The VBA code itself is stored in the local charset (which may not support certain characters). It's generally recommended to avoid non-ASCII characters in VBA code, as this may cause issues on different computers and different charsets. See this post for some trickery if you need non-ASCII characters in VBA literals.

VBA strings are always a BSTR which uses UTF-16 characters.

Regarding sort orders/collations (how strings are compared):

Access has no full support for collations, and no specific case sensitive/case insensitive and accent sensitive/accent insensitive collations.

It does support different sort orders, which determines how strings should be sorted and which characters are equal. An outdated list can be found here. Using the object browser in Access, you can navigate to LanguageConstants and check the list. In recent builds of Office 365, there are some new options that appear to use codepage 65001 (= UTF-8) but I haven't seen docs or experimented with it.

In VBA, string comparisons and sorts are determined by an Option Compare statement at the top of the module. Nearly all VBA applications only support two: Option Compare Binary, any inequality is an unequal string and sorts are case sensitive, and Option Compare Text, use the local language settings to compare strings. For Access, there's a third, Option Compare Database, use the database sort order to compare strings.

Note that not all functions support all unicode characters. Functions with limited support include MsgBox and Debug.Print. This can make it especially hard to debug code when working with characters not in the system code page.

Further notes

VBA does allow (relatively) easy access to the Windows API. Instead of rolling your own string comparison function, you could use CompareStringEx which has options to do case-insensitive diacritic-insensitive comparisons.

Note that for external functions, you need to pass string pointers using StrPtr, passing strings as a string will automatically convert them from a BSTR to a pointer to a null-terminated string in the system codepage. See this answer for a basic example how to call a WinAPI function for a unicode string. You will also have to look up and declare all the constants, e.g. Public Const NORM_IGNORECASE As Long = &H1, etc.

Erik A
  • 31,639
  • 12
  • 42
  • 67