Cap to Cracking and Chunking/Vector Embedding?

Asked Aug 16 '23 at 19:54

Active Aug 16 '23 at 19:54

Viewed 42 times

I am currently using the Azure Machine Learning Python SDK, using the incremental embbedding tutorial that uses Ada 002: https://github.com/Azure/azureml-examples/blob/main/sdk/python/generative-ai/rag/notebooks/faiss/url_to_faiss_incremental_embeddings_with_tabular_data.ipynb

I have a XLSX file with 533,000 rows. I am unable to crack and chunk it as a XLSX, although I am able to process it as a TXT, but then ill get a pipeline error when it tries to embed it.

Is there a cap to cracking and chunking? Is there a cap to the amount of data that can be embedded?

The model works fine with smaller models.

Thanks

Changing the file type, making smaller files.

asked Aug 16 '23 at 19:54

Elijah M

It's seems there is no cap but there might a memory issue with large dataset. For that you can try mltable to access data in parts. Can you provide some more details such as error you are getting, your setup and other details that can help in reproduce the issue? – RishabhM Aug 17 '23 at 09:45

Cap to Cracking and Chunking/Vector Embedding?

0 Answers0