Questions tagged [apache-arrow-cpp]

12 questions
3
votes
0 answers

How to use Apache Arrow to write files in Parquet format on Windows using C++?

I'm trying to write Parquet files on Windows using C++. I followed the instructions I found here and chose the "Using conda-forge for build dependencies" and "Building using Visual Studio (MSVC) Solution Files" approaches. In contrast to the article…
beo
  • 41
  • 3
2
votes
1 answer

Write Apache Arrow table to string C++

I'm trying to write an Apache Arrow table to a string. My big example has problems and I can't get this little example to work. This one segfaults inside of Arrow in the WriteTable call. My bigger example doesn't appear to serialize…
user2183336
  • 706
  • 8
  • 19
2
votes
2 answers

How can I get the row view of data read from parquet file?

Example: Let's say a table name user has id, name, email, phone, and is_active as attributes. And there are 1000s of users part of this table. I would like to read the details per user. void ParquetReaderPlus::read_next_row(long row_group_index,…
Shravan40
  • 8,922
  • 6
  • 28
  • 48
1
vote
1 answer

How to filter rows from arrow::table based on a certain condition in Apache Arrow C++?

I want to do equivalent of pandas operation df[df['certain_date'] > '2023-05-26'] . I have gone through almost all the Apache Arrow related answers on this site. I have been trying some combination of is_in compute function here -…
Abhishek Kumar
  • 729
  • 6
  • 20
1
vote
1 answer

What is the difference between StringType and LargeStringType in Apache Arrow?

According to documentation: class arrow::StringType : public arrow::BinaryType #include Concrete type class for variable-size string data, utf8-encoded. class arrow::LargeStringType : public arrow::LargeBinaryType #include…
1
vote
0 answers

When should a default destructor be explicitly defined in a code module

I notice that the Apache Arrow C++ libraries frequently define non-inline virtual destructors in code modules. Is there guidance for which classes should/should not have an explicitly defined non-inline destructor? For example, RandomAccessFile…
0
votes
0 answers

Error reading decimal datatype from Apache Arrow Parquet CPP library version 11.0.0

I am trying to read a parquet file and store it in custom C structure to further use in my C code case arrow::Type::DECIMAL: { const arrow::Decimal128Type* decimal_type = static_cast
0
votes
1 answer

Apache Arrow IPC streams: SPMC concurrency

Is it expected that a multiple IPC stream readers can concurrently tail a single stream writer publishing from another process? The descriptions of things like "IPC streams" leads me to think yes, but I cannot find any positive confirmation in the…
kdkavanagh
  • 41
  • 5
0
votes
0 answers

Conan don't create arrow bundle dependency

In our project, we use arrow10.0.0 and each time for our builds, we made an arrow build. Now we want to use conan to make our builds faster. Using the arrow.cmake file, we could determine which components should be Bundled, and then made a link to …
0
votes
1 answer

How do you compute Grouped Aggregations in Apache Arrow in C++

In Apache Arrow, it seems to be possible to do queries that are similar to "group by" in SQL (see their documentation); however, there are not any examples of how to use this. I want to know how to go from an arrow::Table and for a given column be…
user3117152
  • 94
  • 14
0
votes
0 answers

Apache Arrow C++: What's the best fast alternative to parquet::StreamWriter?

I'm writing a program to convert a tabular ("rownar") custom binary file format to Parquet using Arrow C++. The core of the program works as follows: ColInfo colinfo = ...; // fixed schema per file, not known at compile time parquet::StreamWriter w…
Jonas H.
  • 2,331
  • 4
  • 17
  • 23
0
votes
2 answers

Is there a way to read files using arrow from the remote server in c++?

Reading CSV or Parquet files from local fs is very easy, but it seems that arrow does not support reading files from a remote server given its ip. Is there a way to achieve this? e.g. read a subset columns of a Parquet file from a remote server…
Raining.
  • 11
  • 2