19

If a node has 100 million children, will there be a performance impact if I:

a) Query, but limit to 10 results

b) Watch one of the children only

I could split the data up into multiple parents, but in my case I will have a reference to the child so can directly look it up (which reduces the complexity). If there is an impact, what is the maximum number for each scenario before performance is degraded?

CalM
  • 542
  • 6
  • 14
  • 2
    You should read this https://firebase.google.com/docs/database/web/structure-data – Fowotade Babajide Sep 26 '16 at 21:45
  • 1
    And also https://firebase.google.com/docs/database/web/retrieve-data – Kato Sep 26 '16 at 23:57
  • 1
    I had a similar problem (presenting a large list in a RecyclerView), see here: http://stackoverflow.com/questions/36401332/firebase-android-offline-performance You can find my solution here: http://stackoverflow.com/a/37772597/6155664 – Niels Sep 27 '16 at 13:29

1 Answers1

17

If a node has that many children, accessing the node in any way is a recipe for problems. Accessing an individual child is never a problem.

Querying the node for a subset of its children still requires that the database consider each of those children. If you request the last 10 out of 100 million items, you're asking the database to consider 999,999,990 items that you're apparently not interested in.

It is impossible to say what the maximum is without a way more concrete description of the data size, ordering criteria, etc. But to be honest, even then the best you're likely to get is a value with a huge variance that is likely to change over time.

You best approach in Firebase (and most NoSQL solutions) is to model the data in a way that fits with how your app uses that data. So for example: if you need to show the latest 10 items to your users, store the (keys of) those latest 10 items in a separate list.

items
    -K........0
        title: "Firebase Performance: How many children per node?"
        body: "If a node has 100 million children, will there be a performance impact if I:..."
    -K........1
        title: "Firebase 3x method won't working in real device but worked in simulator swift 3.0"
        body: "Hi we are working with google firebase 3x version and we faced..."
    .
    .
    .
    -K999999998
    -K999999999
recent
    -K999999990: true
    -K999999991: true
    -K999999992: true
    -K999999993: true
    -K999999994: true
    -K999999995: true
    -K999999996: true
    -K999999997: true
    -K999999998: true
    -K999999999: true

I'm not sure if I got the right number of nines in there, but I hope you get the idea.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
  • 2
    And "watching one of the children only" is as simple as `ref.child( childId ).on('value', ...)` – Kato Sep 26 '16 at 23:58
  • 1
    Thanks! If you are querying by a specific indexed field, I'd expect to be able to achieve O(log(N)). Is that the kind of performance you get? (That would take ~26 attempts with a dataset of 100 mil) – CalM Sep 27 '16 at 05:33
  • In the example above, is the growing list of `items` is a concern even though if I had no direct read on the `items` node? Would you have implemented some sort of archiving mechanism? like `items/june2017`, `items/aug2017`,.... etc? or maybe move the items all together to a new node after when it's not actively used. – Atu Jul 21 '17 at 03:07
  • 1
    Indeed: separate your active data from the historical data. If you're mostly keeping the historical data for reporting, consider storing it in a system better suited for ad-hoc queries on large data sets, e.g. BigQuery. – Frank van Puffelen Jul 21 '17 at 13:34