I'm using the Google Cloud Java API to get objects out of Google Cloud Storage (GCS). The code for this reads something like this:
Storage storage = ...
List<StorageObject> storageObjects = storage.objects().list(bucket).execute().getItems();
But this will not return all items (storage objects) in the GCS bucket, it'll only return the first 1000 items in the first "page". So in order to get the next 1000 items one should do:
Storage.Objects.List list = storage.objects().list(bucket).execute();
String nextPageToken = objects.getNextPageToken();
List<StorageObject> itemsInFirstPage = objects.getItems();
if (nextPageToken != null) {
// recurse
}
What I want to do is to find an item that matches a Predicate
while traversing all items in the GCS bucket until the predicate is matched. To make this efficient I'd like to only load the items in the next page when the item wasn't found in the current page. For a single page this works:
Predicate<StorageObject> matchesItem = ...
takeWhile(storage.objects().list(bucket).execute().getItems().stream(), not(matchesItem));
Where takeWhile
is copied from here.
And this will load the storage objects from all pages recursively:
private Stream<StorageObject> listGcsPageItems(String bucket, String pageToken) {
if (pageToken == null) {
return Stream.empty();
}
Storage.Objects.List list = storage.objects().list(bucket);
if (!pageToken.equals(FIRST_PAGE)) {
list.setPageToken(pageToken);
}
Objects objects = list.execute();
String nextPageToken = objects.getNextPageToken();
List<StorageObject> items = objects.getItems();
return Stream.concat(items.stream(), listGcsPageItems(bucket, nextPageToken));
}
where FIRST_PAGE
is just a "magic" String
that instructs the method not to set a specific page (which will result in the first page items).
The problem with this approach is that it's eager, i.e. all items from all pages are loaded before the "matching predicate" is applied. I'd like this to be lazy (one page at a time). How can I achieve this?