3

I am making an app where users can follow each other. To decide how to model it in firestore I would like to know how does collection size affect query performance. I first thought of making it like this:

relationships(coll.)
----{userId_1}(document)
--------following(coll)
------------{someId1}(document)
------------{someId2}(document)
.....
--------followers(coll)
------------{someId5}(document)
------------{someId7}(document)
.....
----{userId_2}(document)
--------following(coll)
------------{someId11}(document)
------------{someId24}(document)
.....
--------followers(coll)
------------{someId56}(document)
------------{someId72}(document)
.....

So I would have main collection relationships, then each document would represent one user and he would have two collections - following and followers, and in those collections I would store documents with data like id,name,email,.. Then when user1 wants to see his followers, I would get all documents under relationships/userId_1/followers, and if he would like to see who he follows I would get documents under relationships/userId_1/following

I also thought about doing it like this:

relationships(coll)
----{user5id_user4id}(document)
--------user1:"user5id" (field)
--------user2:"user4id" (field)
.........(other fields)
----{user4_user5}(document)
--------user1:"user4id" (field)
--------user2:"user5id" (field)
.........(other fields)

I would have one main collection relationships where each document would represent one following relationship, document name would be firstUserId_secondUSerId (means firstUserId follows secondUserId) and I would also have two fields user1 and user2 that would store ids of two users where user1 follows user2 So if I am {myUserId} and I would like to get all the people who I follow I would do a query on relationships collection where user1 = myUserId And if I would like to get all the people who follow me I would do a query on relationships collection where user2 = myUserId since each document represents relation user1 follows user2.

So my question is which way would be more efficient with querying the data. In first case each user would have collection of his followers/following and I would just get the documents, in second case relationship would have many document representing user1->follows->user2 relation. I know that I would be billed by number of documents that query function returns, but how fast would it be if it would need to search through large collection.

Alen
  • 949
  • 3
  • 17
  • 37

1 Answers1

7

Collection size has no bearing on the performance or cost of a query. Both are determined entirely by size of the result size (number of documents). So, a query for 10 documents out of 100 performs and costs the same as a query for 10 documents out of 100,000. The size of 10 is the only thing that matters here.

See also: Queries scale with the size of your result set, not the size of your data set

Doug Stevenson
  • 297,357
  • 32
  • 422
  • 441
  • 1
    I don't think this is true anymore. We've been dealing with a collection with nearly 1 million documents and have been trying to iterate over the entire collection with much difficulty. Our first approach was to use offset and limit which has pretty terrible paging performance. Using a page size of 1000 documents you go from ~500ms per page to 4s per page in the first 10 pages. Switching to startAfter using a document reference the first 10 pages all come in at 500ms, but over time that slowly increases. By page 148 the average response time is near 2s. – Rojuinex Jun 09 '22 at 13:40
  • @Rojuinex offset offers no performance benefits. In fact, all it does behind the scenes is silently read that many documents from the index before actually getting to the ones you want. So you are reading offset number of documents from the index plus the actual number of documents that you want returned. Read this: https://firebeast.dev/tips/do-not-use-offset – Doug Stevenson Jun 09 '22 at 13:57