5

We have this one CF that only has about 1000 rows. Is there a way with astyanax to read all 1000 rows? Does thrift even support that?

thanks, Dean

Dean Hiller
  • 19,235
  • 25
  • 129
  • 212

1 Answers1

4

You can read all rows with the thrift call get_range_slices. Note that it returns rows in token order, not key order. So it's fine to read all the rows but not to do ranges across row keys.

You can use it in Astyanax with the getAllRows(). Here is some sample code (copied from the docs at https://github.com/Netflix/astyanax/wiki/Reading-Data#iterate-all-rows-in-a-column-family)

Rows<String, String>> rows;
try {
    rows = keyspace.prepareQuery("ColumnFamilyName")
        .getAllRows()
        .setBlockSize(10)
        .withColumnRange(new RangeBuilder().setMaxSize(10).build())
        .setExceptionCallback(new ExceptionCallback() {
             @Override
             public boolean onException(ConnectionException e) {
                 try {
                     Thread.sleep(1000);
                 } catch (InterruptedException e1) {
                 }
                 return true;
             }})
        .execute().getResult();
} catch (ConnectionException e) {
}

// This will never throw an exception
for (Row<String, String> row : rows.getResult()) {
    LOG.info("ROW: " + row.getKey() + " " + row.getColumns().size());
}

This will return the first 10 columns of each row, in batches of 10 rows. Increase the number passed to RangeBuilder().setMaxSize to get more (or fewer) columns.

Richard
  • 11,050
  • 2
  • 46
  • 33
  • hmmm, something doesn't make sense here. while this is fine for 1000 rows, let's say I temporarily wanted to do 1 million rows without blowing out memory and without doing map/reduce(for now). Is there a way to ask for next row, next row, next row (and it does it normal batching as I set the query.setRowLimit(200) such that only 200 are loaded at a time? – Dean Hiller Jul 15 '13 at 21:50
  • According to this: https://github.com/Netflix/astyanax/issues/53 the iterator returned in rows is lazy, so it won't store more than a page size in memory at once. – Richard Jul 16 '13 at 07:02
  • 1
    There is yet another way, that is now preferred, using AllRowsReader. Sample code is here: https://github.com/Netflix/astyanax/wiki/All-rows-query. You get a callback for each row. – Richard Jul 16 '13 at 09:14