0

My Realtime Database has achieved over 1GB data stored, so in order to trim it,save storage and optimize daily use once a month i run a routine to delete old and irrelevant data.

it goes like this:

App.getDatabaseInstance().getReference("store/orders/historic/").orderByChild("creationTs").limitToLast(500).endAt(System.currentTimeMillis() - (90L * ONE_DAY_MILLIS)).addListenerForSingleValueEvent(new ValueEventListener() {
            @Override
            public void onDataChange(@NonNull DataSnapshot dataSnapshot) {
                if (dataSnapshot.hasChildren()) {
                    for (DataSnapshot historicDs : dataSnapshot.getChildren()) {
                          historicDs.getRef().removeValue();
                    }      
                    cleanHistoricBranch();
                } else
                    System.out.println("FINISHED!!!");
            }

            @Override
            public void onCancelled(@NonNull DatabaseError databaseError) {

            }
        });

The query run over a few thousands nodes (NOT MILLIONS) in the database but it takes HOURS to complete. i guess the problem is that the data must be downloaded and deleted ONE BY ONE

i tried different approaches but didnt work well.

App.getDatabaseInstance().getReference("store/orders/historic/").orderByChild("creationTs").limitToLast(500).endAt(System.currentTimeMillis() - (90L * ONE_DAY_MILLIS)).addListenerForSingleValueEvent(new ValueEventListener() {
                @Override
                public void onDataChange(@NonNull DataSnapshot dataSnapshot) {
                    if (dataSnapshot.hasChildren()) {
                        dataSnapshot.getRef().removeValue(); //deletes the whole branch, even the nodes that doesnt match the query.     
                        cleanHistoricBranch();
                    } else
                        System.out.println("FINISHED!!!");
                }
    
                @Override
                public void onCancelled(@NonNull DatabaseError databaseError) {
    
                }
            });

so does anyone have any better approach to trim large number of nodes in the database hierarchy? each node has very few data, but i have around 20 to 50 thousands of nodes candidates to removal.

Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807
Rafael Lima
  • 3,079
  • 3
  • 41
  • 105

1 Answers1

1

If the time is mostly spent reading the data, the common approaches are:

  1. Run the process more frequently, so that you have to do less data each time.
  2. Set up integrated backups of your database, and use that backup to determine the keys to delete offline. Then send the write operations to the online database.
Frank van Puffelen
  • 565,676
  • 79
  • 828
  • 807