Questions tagged [apache-arrow-datafusion]

DataFusion is an extensible query execution framework, written in Rust, that uses Apache Arrow as its in-memory format.

Source Code

https://github.com/apache/arrow-datafusion

Documentation

https://arrow.apache.org/datafusion/index.html

12 questions
3
votes
2 answers

Indexing in datafusion

Context: I am using datafusion to build a data validator for a csv file input. Requirement: I want to add row number where the error occurred in output report. In pandas, I have ability to add row index which can be used for this purpose. Is there a…
praveent
  • 562
  • 3
  • 10
1
vote
0 answers

How do I convert a Polars DataFrame to Vec?

edit: To hopefully be more concise, how do I do this?- use polars::prelude::{DataFrame, NamedFrom, df}; use arrow::record_batch::RecordBatch; fn main() { let polars_df: DataFrame = df!("cat_data" => &[1.0, 2.0, 3.0, 4.0], …
1
vote
1 answer

Rust with Datafusion - Trying to Write DataFrame to Json

*Repo w/ WIP code: https://github.com/jmelm93/rust-datafusion-csv-processing Started programming with Rust 2 days ago, and have been trying to resolve this since ~3 hours into trying out Rust... Any help would be appreciated. My goal is to write a…
jmelm93
  • 145
  • 1
  • 10
1
vote
0 answers

How to persist DataFusion DataFrame in memory

How can you persist a DataFusion DataFrame in memory with the Python bindings? From what I understand, DataFusion doesn't persist data in memory by default. Here's how I register a CSV as a table: import datafusion ctx =…
Powers
  • 18,150
  • 10
  • 103
  • 108
1
vote
1 answer

Read CSV into DataFusion DataFrame with Python

How can I read a CSV into a DataFusion DataFrame with datafusion-python? Here's what I have so far: import datafusion ctx = datafusion.SessionContext() I couldn't find any instructions in the docs. I am using DataFusion v0.6.0.
Powers
  • 18,150
  • 10
  • 103
  • 108
0
votes
0 answers

How did cnosdb serialize the schema in datafusion?

How did cnosdb serialize the schema in datafusion? the trait bound sdatafusion: :arrow::datatypes::Schema: Serialize ' is not satisfied §'a T §'a mut T the following other types implement trait Serialize': (TO, T1) (TO, T1, T2) (TO, T1, T2, Т3) (то,…
Baker
  • 71
  • 5
0
votes
0 answers

How may I construct a value of vec of vecs for a record batch in DataFusion?

I can create column of type "UTF8" as follows let schema = Arc::new(Schema::new(vec![ Field::new("id", DataType::Int32, false), Field::new("payload", DataType::Utf8, false), ])); let vec_of_strings: Vec =…
Finlay Weber
  • 2,989
  • 3
  • 17
  • 37
0
votes
1 answer

Creating Datafusion's Dataframe from Vec in Rust?

I am trying to do something similar to this question here but instead of using the polars library, I will like to use the Datafusion library The idea is to go from a vec of struct like this: #[derive(Serialize)] struct Test { id:u32, …
dade
  • 3,340
  • 4
  • 32
  • 53
0
votes
0 answers

Read CSV file with Rust DataFusion DataFrame API having space as a Seperator

Having a CSV file with the following format: 18.0 8 307.0 130.0 3504. 12.0 70 1 "chevrolet chevelle malibu" 15.0 8 350.0 165.0 3693. 11.5 70 1 "buick skylark 320" 18.0 8 318.0 150.0 …
DataPsycho
  • 958
  • 1
  • 8
  • 28
0
votes
1 answer

Extract Year, Month, Day from Unix TimeStamp column in Rust DataFusion DataFrame?

I have created a DataFusion DataFrame: | asin | vote | verified | unixReviewTime | reviewText | +------------+------+----------+----------------+-----------------+ | 0486427706 | 3 | true | 1381017600 | good | |…
DataPsycho
  • 958
  • 1
  • 8
  • 28
0
votes
1 answer

Impute and Add new calculated column with Rust DataFusion?

Considering, I have a json datafile named test_file.json with the following content. {"a": 1, "b": "hi", "c": 3} {"a": 5, "b": null, "c": 7} Here how I can read the file in With DataFrame API of DataFusion: use…
DataPsycho
  • 958
  • 1
  • 8
  • 28
0
votes
0 answers

How to get an efficient data ingestion solution using Java, Apache Arrow and Apache Parquet

I'm working on a data lake solution for an IoT framework that does 44Khz data acquisition for a few dozen sensors (~990.000 measures/seconds). I would like suggestions on how to get an efficient data ingestion solution using Java 11+, Apache Arrow…
João Paraná
  • 1,031
  • 1
  • 9
  • 18