0

Considering, I have a json datafile named test_file.json with the following content.

{"a": 1, "b": "hi", "c": 3}
{"a": 5, "b": null, "c": 7}

Here how I can read the file in With DataFrame API of DataFusion:

use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    let file_path = "datalayers/landing/test_file.json";
    
    let mut ctx = SessionContext::new();
    let df = ctx.read_json(file_path, NdJsonReadOptions::default()).await?;
    df.show().await?;
    Ok(())

I would like to do the following operation:

  • Impute the null value in the b column with an empty "" string either using fill na or case when statement
  • Create a new calculated column with combining the column a and b col("a") + col("b")

I have tried to went through the api documentation but could not find any function like with_column which spark has to add a new column and also how to impute the null values.

To add two columns I can do that with column expression col("a").add(col("c")).alias("d") but I was curious to know if it is possible to use something like with_column to add a new column.

DataPsycho
  • 958
  • 1
  • 8
  • 28

1 Answers1

1

DataFusion's DataFrame does not currently have a with_column method but I think it would be good to add it. I filed an issue for this - https://github.com/apache/arrow-datafusion/issues/2844

Until that is added, you could call https://docs.rs/datafusion/9.0.0/datafusion/dataframe/struct.DataFrame.html#method.select to select the existing columns as well as the new expression:

df.select(vec![col("a"), col("b"), col("c"), col("a").add(col("c")).alias("d")]);
Andy Grove
  • 131
  • 3
  • Thanks for you answer. What about imputing a column. Replacing null with a text is there any method available? The first question i have asked if you know by any chance thanks. – DataPsycho Jul 06 '22 at 07:25
  • 1
    There is a `coalesce` function that returns the first non-null value. So perhaps `coalesce(vec![col("b"), lit("")])` ? – Andy Grove Jul 06 '22 at 07:33