13

I would like to add a column filled with a character N in a DataFrame in SparkR. I would do it like that with non-SparkR code :

df$new_column <- "N"

But with SparkR, I get the following error :

Error: class(value) == "Column" || is.null(value) is not TRUE

I've tried insane things to manage it, I was able to create a column using another (existing) one with df <- withColumn(df, "new_column", df$existing_column), but this simple thing, nope...

Any help ?

Thanks.

François M.
  • 4,027
  • 11
  • 30
  • 81
  • 1
    The only hack I know for this is to use `ifelse` with the same return value for both conditions. So `df$new <- ifelse(condition, 'N', 'N')`. – mtoto May 19 '16 at 15:40
  • Worked, thank you very much (put it as an answer if you want me to validate it) – François M. May 19 '16 at 15:43

2 Answers2

15

The straight solution will be to use SparkR::lit() function:

df_new = withColumn(df, "new_column_name", lit("N"))

Edit 7/17/2019

In newer Spark versions, the following also works:

df1$new_column <- "N"
df1[["new_column"]] <- "N"
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Dmitriy Selivanov
  • 4,545
  • 1
  • 22
  • 38
  • 1
    Nice! Didn't know about `lit()`, will delete my answer when the OP accepts yours. – mtoto May 20 '16 at 07:48
  • How would I add a column full of NAs ? – François M. Jun 14 '16 at 09:53
  • 1
    When I attempt to do the same task, `df <- withColumn(df, "col", lit(NA))`, and then return`str(df)`, I receive the following error: `Error in FUN(X[[i]], ...) : Unsupported data type: null`. I can open a new question, but thought that @DmitriySelivanov or @fmalaussena may know the answer after working with the problem. – kathystehl Jul 15 '16 at 17:28
  • `df <- withColumn(df, "col", lit("NA"))` should work (with `" "` around the `NA`) – François M. Jul 18 '16 at 08:23
0

There's an easier way to use SparkR::lit() that more closely mimics the syntax you tried first:

df$new_column <- lit("N")
data princess
  • 1,130
  • 1
  • 23
  • 42