Considering, I have a json datafile named test_file.json
with the following content.
{"a": 1, "b": "hi", "c": 3}
{"a": 5, "b": null, "c": 7}
Here how I can read the file in With DataFrame API of DataFusion:
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
let file_path = "datalayers/landing/test_file.json";
let mut ctx = SessionContext::new();
let df = ctx.read_json(file_path, NdJsonReadOptions::default()).await?;
df.show().await?;
Ok(())
I would like to do the following operation:
- Impute the null value in the b column with an empty
""
string either using fill na or case when statement - Create a new calculated column with combining the column a and b
col("a") + col("b")
I have tried to went through the api documentation but could not find any function like with_column
which spark has to add a new column and also how to impute the null values.
To add two columns I can do that with column expression col("a").add(col("c")).alias("d")
but I was curious to know if it is possible to use something like with_column
to add a new column.