22

I am looking for useful documentations or examples for the Apache Arrow API. Can anyone point to some useful resources? I was only able to find some blogs and JAVA documentation (which doesn't say much).

From what I read, it is a standard in-memory columnar database for fast analytics. Is it possible to load the data to arrow memory and to manipulate it ?

Shastick
  • 1,218
  • 1
  • 12
  • 29
Rijo Joseph
  • 1,375
  • 3
  • 17
  • 33

2 Answers2

5

You should use arrow as a middle man between two applications which need to communicate using passing objects.

Arrow isn’t a standalone piece of software but rather a component used to accelerate analytics within a particular system and to allow Arrow-enabled systems to exchange data with low overhead.

For example Arrow improves the performance for data movement within a cluster.

See tests for examples.

  @Test
  public void test() throws Exception {
    BufferAllocator allocator = new RootAllocator(Integer.MAX_VALUE);
    File testInFile = testFolder.newFile("testIn.arrow");
    File testOutFile = testFolder.newFile("testOut.arrow");

    writeInput(testInFile, allocator);

    String[] args = {"-i", testInFile.getAbsolutePath(), "-o", testOutFile.getAbsolutePath()};
    int result = new FileRoundtrip(System.out, System.err).run(args);
    assertEquals(0, result);

    validateOutput(testOutFile, allocator);
}

Also Apache Parquet uses it. There are conversion examples from/to arrow objects:

MessageType parquet = converter.fromArrow(allTypesArrowSchema).getParquetSchema();

Schema arrow = converter.fromParquet(supportedTypesParquetSchema).getArrowSchema();
Ori Marko
  • 56,308
  • 23
  • 131
  • 233
  • 1
    The linked test example isn't very informative. It isn't immediately obvious what a `BufferAllocator` or `RootAllocator` is. There isn't any evidence of "manipulating data" either... – Ramón J Romero y Vigil Jul 09 '17 at 11:28
  • The converter included in Parquet's github repo seems to rely on Arrow 0.1.0 (and I can't get it to work, I get ClassNotFoundException when doing `new SchemaConverter()`, though I may be doing something wrong when installing it, as I could not find any published artifact for parquet-arrow) – Shastick Oct 30 '17 at 14:46
0

They have some basic documentation on how to use Apache Arrow on their site now. Although it could use a bit of filling out.

jjbskir
  • 8,474
  • 9
  • 40
  • 53