How to update tensorflow model with fast incoming new data

Question

I've been looking at tensorflow samples for recommendations such as Recommending movies: retrieval.

This is the process to generate recommendations -

# Build approximate index
scann_index = tfrs.layers.factorized_top_k.ScaNN(model.user_model)
scann_index.index_from_dataset(
  tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
)

# Get recommendations
_, titles = scann_index(tf.constant(["42"]))
print(f"Recommendations for user 42: {titles[0, :3]}")

This will recommend movies from the list of known movies in the index. If new movies (or videos) are getting released every second, how do I update this index so fast? Transfer learn with new data every 5 minutes? Can I supply the list of new movies/videos along with the query (which is the user the recommendations are for) as input? What are the options to include new data before creating a new model in a batch process overnight?

score 1 · Answer 1 · answered Jul 12 '23 at 21:55

It might be worthwhile checking out previous threads on incremental learning using TF.

That thread will give some background, but you should be able to maintain online training as the model updates by keeping the model in memory, updating your dataset as new data comes in, and running .fit on it to update.

To use this model outside the host that this is running on, you will have to save it using model.save (https://www.tensorflow.org/tutorials/keras/save_and_load#save_the_entire_model) and load it again for any hosts only dedicated to inference. Of course, you can also run inference off the continuously trained model, but this would depend on how often you expect to spend retraining.

Thank you. Looks like a good suggestion. Let me check out the documentation you pointed out. — Pradeep, Jul 13 '23 at 00:46

How to update tensorflow model with fast incoming new data

1 Answers1