1

If I do a query like below where I'm searching for the same ID but on two different columns. Should I have an index like this? Or should I create 2 separate indexes, one for each column?

modelBuilder.Entity<Transfer>()
  .HasIndex(p => new { p.SenderId, p.ReceiverId });

Query:

var transfersCount = await _dbContext.Transfers
    .Where(p => p.ReceiverId == user.Id || p.SenderId == user.Id)
.CountAsync();

What if I have a query like this below, would I need a multicolumn index on all 4 columns?

var transfersCount = await _dbContext.Transfers
.Where(p => (p.SenderId == user.Id || p.ReceiverId == user.Id) &&
      (!transferParams.Status.HasValue || p.TransferStatus == (TransferStatus)transferParams.Status) &&
      (!transferParams.Type.HasValue || p.TransferType == (TransferType)transferParams.Type))
.CountAsync();
chuckd
  • 13,460
  • 29
  • 152
  • 331
  • Can you provide any examples to help me out? – chuckd Jan 26 '22 at 05:01
  • I messed that all up.. anyway, it depends on the queries. Can there ever be a query for Sender ID alone as well? If so, that should be in a primary axis location (as well); so one index with Sender ID primary, one with Receiver ID primary. – user2864740 Jan 26 '22 at 05:04
  • 1
    Will the Receiver ID ever be queried alone? I don't think so, so far it will query on both ids. – chuckd Jan 26 '22 at 05:04
  • If I take a look at the columns in Sql Exporer, I can see there's already an index on SenderId. Is that good enough? – chuckd Jan 26 '22 at 05:05
  • Usually yes. Look at the generated Query Plan. If the index is not *covering* then there might be a required Key Lookup (which can alter plan selection; however, index usage will be preferred as the estimated number of row decreases in proportion to all row). – user2864740 Jan 26 '22 at 05:06
  • Do you knw what tools I can use to look at the QP, if I'm on a Mac? Currently using Azure Data Studio to look into my DB. – chuckd Jan 26 '22 at 05:08
  • That I do not know. I use SSMS :-/ – user2864740 Jan 26 '22 at 05:08

2 Answers2

2

I recommend two single-column indices.

The two single-column indices will perform better in this query because both columns would be in a fully ordered index. By contrast, in a multi-column index, only the first column is fully ordered in the index.

If you were using an AND condition for the sender and receiver, then you would benefit from a multi-column index. The multi-column index is ideal for situations where multiple columns have conditional statements that must all be evaluated to build the result set (e.g., WHERE receiver = 1 AND sender = 2). In an OR condition, a multi-column index would be leveraged as though it were a single-column index only for the first column; the second column would be unindexed.

John Glenn
  • 1,469
  • 8
  • 13
  • Hi John. So if I had two more columns (2 enums) that I was also querying on where I have the where clause separating them by &&, then I should create a multi column index for all 4 values (2 ids and 2 enums)? I'll post an example in my original post. – chuckd Jan 26 '22 at 05:59
  • 1
    @user1186050 - For a few enums getting stored as integers, it may not be worth creating an index at all. Indices tend to be really useful when the data in a column is fairly unique between rows and spans a wide range of possible values (like the sender and receiver). Remember that each index that you create is basically a penalty for performing an insert / update operation, so you should be judicious in creating them. The only time I would recommend a multi-column index is when you have two columns with a wide range of data that will be *frequently* queried with an AND condition. – John Glenn Jan 26 '22 at 06:04
  • so you don't recommend any index for the second query I posted? Also can I easily add an index after I go live without messing up the data in the DB? – chuckd Jan 26 '22 at 06:19
  • @user1186050 Yeah! You can add or remove indices without affecting the data being stored. I regularly make tweaks to the indexing strategy as the data changes or queries change. – John Glenn Jan 26 '22 at 06:25
  • Good to know thanks! – chuckd Jan 26 '22 at 06:28
  • 1
    @user1186050 For what you have posted, my opinion is that no multi-column indices are required. If you notice that query runs very slowly, then maybe revisit it. However, I don't think you'd see any real benefit to adding one. – John Glenn Jan 26 '22 at 06:28
1

The full intricacies of index design would take well more than an SO answer to explain; there are probably books about it, and it will feature as a reasonable proportion of a database administrator's job

Indexes have a cost to maintain so you generally strive to have the fewest possible that offer you the most flexibility with what you want to do. Generally an index will have some columns that define its key and a reference to rows in the table that have those keys. When using an index the database engine can quickly look up the key, and discover which rows it needs to read from. It then looks up those rows as a secondary operation. Indexes can also store table data that aren't part of the lookup key, so you might find yourself creating indexes that also track other columns from the row so that by the time the database has found the key it's looking for in the index it also has access to the row data the query wants and doesn't then need to launch a second lookup operation to find the row. If a query wants too many rows from a table, the database might decide to skip using the index at all; there's some threshold beyond which it's faster to just read all the rows direct from the table and search them rather than suffer the indirection of using the index to find which rows need to be read

The columns that an index indexes can serve more than one query; order is important. If you always query a person by name and also sometimes query by age, but you never query by age alone, it would be better to index (name,age) than (age,name). An index on (name,age) can serve a query for just WHERE name = ..., and also WHERR name = ... and age = .... If you use an OR keyword in a where clause you can consider that as a separate query entirely that would need its own index. Indeed the database might decide to run "name or age" as two parallel queries and combine the results to remove duplicates. If your app needs later change so that instead of just querying a mix of (name), (name and age) it is now frequently querying (name), (name and age), (name or age), (age), (age and height) then it might make sense to have two indexes: (name, age) plus (age, height). The database can use part or all of both of these to server the common queries. Remember that using part of an index only works from left to right. An index on (name, age) wouldn't typically serve a query for age alone.

If you're using SQLServer and SSMS you might find that showing the query plan also reveals a missing index recommendation and it's worth considering carefully whether an index needs to be added. Apps deployed to Microsoft azure also automatically look at common queries where performance suffers because of a lack of an index and it can be the impetus to take a look at the query being run and seeing how existing or new indexes might be extended or rearranged to cover it; as first noted it's not really something a single SO answer of a few lines can prep you for with a "always do this and it will be fine" - companies operating at large scale hire people whose sole mission is to make sure the database runs well they usually grumble a lot about the devs and more so about things like entity framework because an EF LINQ query is a layer disconnected from the actual SQL being run and may not be the most optimal approach to getting the data. All these things you have to contend with.

In this particular case it seems like indexes on SenderId+TransferStatus+TransferType and another on ReceiverId+TransferStatus+TransferType could help the two queries shown, but I wouldn't go as far as to say "definitely do that" without taking a holistic view of everything this table contains, how many different values there are in those columns and what it's used for in the context of the app. If Sender/Receiver are unique, there may be no point in adding more columns to the index as keys. If TransferStatus and Type change such that some combination of them helps uniquely identify some particular row out of hundreds then it may make sense, but then if this query only runs once a day compared to another that is used 10 times a second... There's too much variable and unknown to provide a concrete answer to the question as presented; don't optmize prematurely - indexing columns just because they're used in some where clause somewhere would be premature

Caius Jard
  • 72,509
  • 5
  • 49
  • 80