5

Is there any way to fetch all the documents loaded into vespa?

I tried querying with regular expressions, but it didn't work as expected.

select * from entity where ID matches "[.]+";

ID is not an attribute, but I tried with an attribute field, both didn't respond with any values.

Raghu Venmarathoor
  • 858
  • 11
  • 28

2 Answers2

5

Using visiting instead of search, either with the vespa-visit tool or using visiting in the document/v1 REST API is usually preferable for dumping documents.

If you want to use search, use this query to match all documents of a type:

select * from yourdocumenttype where sddocname contains 'yourdocumenttype';

To iterate over all documents with this, it will be more efficient to use a some field in your document to partition the document set into smaller chunks and query for one chunk at a time (e.g if you have a timestamp field, add a range condition to the query to retrieve documents for a slice of time in each query).

(Regular expressions are only supported in streaming mode.)

Jon
  • 2,043
  • 11
  • 9
3

To dump all documents from Vespa, use vespa-visit:

"visit" is a different interface than the search interface - it is built for large data transfers with high throughput, but not necessarily low latency

Teams use visit to extract a full dump or a subset, using a selection expression

Kristian Aune
  • 876
  • 5
  • 5