4

I have a list of names where some names contain diacritics characters, like Á, Ê

For example:

Átila
André
Êlisa
Mercês
Sá

But when I run a simple query like this:

MATCH (p:Person)
ORDER BY p.Name

It returns the names out of alphabetical order, because of the diacritics:

André
Mercês
Sá
Átila
Êlisa

I would like that it return in alphabetical order, independently of the presence of diacritics (pt-BR / portuguese / Brazil).

I can do this in Microsoft SQL Server:

SELECT Name
FROM Person
ORDER BY Name COLLATE SQL_Latin1_General_CP1_CS_AS

How to do that in Cypher?

Tony
  • 16,527
  • 15
  • 80
  • 134
  • 1
    I think you'll have to work along these lines: https://stackoverflow.com/questions/3322152/is-there-a-way-to-get-rid-of-accents-and-convert-a-whole-string-to-regular-lette – Graphileon Sep 10 '22 at 12:35

2 Answers2

1

Cypher doesn't have the support for specifying Collation and Character Set as of now, but you can try this for your use case:

MATCH (p:Person)
WITH p, apoc.text.clean(p.name) AS cleanedName
RETURN p.name
ORDER BY cleanedName

APOC is an external library, with some very useful functions. apoc.text.clean is one of them, it only keeps alphanumeric characters in the string and converts them all to lowercase. Hence, there are two limitations, if you want non-alphanumeric characters to play a role in sorting, or the sorting should be case-sensitive, then this is not an exact solution, then you can basically write your own custom procedure and call it from within Cypher, as described here.

Please install the APOC library first, if not already installed.

Charchit Kapoor
  • 8,934
  • 2
  • 8
  • 24
1

DISCLAIMER: I'm the co-founder and CTO of Memgraph.

I would say this goes under the proper support for Unicode. openCypher grammar has the support to parse various characters, but it's an implementation detail of how characters are stored and later interpreted. I'm not aware of a clause like COLLATE in openCypher.

When it comes to Memgraph, it just stores and interprets raw bytes (for now), which results in the wrong sorting order. Options when using Memgraph are:

  • implementing a user-defined function in C/C++/Python/Rust which will allow you to sort the items correctly
  • maybe a quick fix is to order on the client/application side (I'm aware of all the problems that might produce, large transferred data volume, and slow queries), but maybe it's a quick solution for you.

There is a related GitHub issue. Please contribute with more details or follow the discussion. At some point, we'll add native capability :) Also, we are looking for contributors!

buda
  • 460
  • 4
  • 8