1

We are designing an LDAP schema (specifically for OpenDJ) and we primarily need to be able to search on the mail attribute. We don't need to do a substring search as the user would provide the whole email address when they log in.

We already have an index on the mail attribute. However we are also considering to sub-divide the user directory by the first letter of the email address as well (so all users with an email address that starts with the letter A would be in an ou=A subdirectory under ou=users. The only value I can see in doing this is that when we do searches for a user by email, we can limit the baseDN of the search, thus reducing the scope of the search to approximately 1/26 of the entire directory.

My primary question is, does limiting the baseDN of an LDAP search like this provide any improvement on performance if the attribute already has an index? Do indexes take into account the baseDN, or are they indexed over the whole directory?

A secondary question, if I'm allowed, is there any other usage for splitting the users directory by first letter (or any other arrangement) other than providing a more specific baseDN when searching?

Caleb
  • 524
  • 5
  • 18

2 Answers2

1

What you are thinking about seems like premature optimization when you don't even know if you have a performance issue. Also, indexes and processing a query is not a standard element of LDAP, it's an implementation detail of the technology you are using.

In OpenDJ, an index is configured and maintain for a whole database backend. The cost of a lookup in the email equality index and returning a single entry is the same whether you have 1 entry or 1 billion entries.

I have more than 20 years of experiences with LDAP and directory services, I've never seen any directory structured with splitting entries by the first letter of an attribute.

Ludovic Poitou
  • 4,788
  • 2
  • 21
  • 30
  • Thanks Ludovic Poitou. I would think setting up the directory structure needs to be addressed at the beginning because it would be difficult to change it later, hence the "premature optimization". However if it has no effect on the index then there's no need to do it. Thanks. – Caleb Dec 07 '20 at 20:16
0

I once (and only once) encountered a problem similar to the one you're anticipating -- essentially you've got so many records that searching for a record creates an unacceptable user experience. In my case, there were over a million customers in the directory. What is now a rather old iteration of IBM's Tivoli Directory Server had several bugs that meant searching the directory took minutes to accomplish (indexes or no indexes). No one wants to wait minutes to log in and pay their bill! And we were constrained to using IBM's LDAP server.

In that case, I used the e-mail address used as the naming attribute when the account was created and never searched the directory. I.E. I'm cn=lisa@example.com,ou=customers,o=example within the directory. When I log in with lisa@example.com, the site programmatically formulates the bind DN as "cn=" + userInput + ",ou=customers,o=example" and validates the supplied password instead of searching for my account.

LisaJ
  • 1,666
  • 1
  • 12
  • 18