2

I've been struggling trying to add a new column if it does not exist. I found the answer in here: Adding column if it does not exist .

However, in my problem I must use it inside purrr environment. I tried to adapt the above answer, but it doesn't fit my needs.

Here is an example what I'm dealing with:

Suppose I have a list of two data.frames:

library(tibble)

A = tibble(
  x = 1:5, y = 1, z = 2
)

B = tibble(
  x = 5:1, y = 3, z = 3, w = 7
)

dt_list = list(A, B)

The column I'd like to add is w:

cols = c(w = NA_real_)

Separately, if I want to add a column if it does not exist, I could do the following:

Since it does exist, not columns is added:

B %>% tibble::add_column(!!!cols[!names(cols) %in% names(.)])

# A tibble: 5 x 4
      x     y     z     w
  <int> <dbl> <dbl> <dbl>
1     5     3     3     7
2     4     3     3     7
3     3     3     3     7
4     2     3     3     7
5     1     3     3     7

In this case, since it does not exist, w is added:

A %>% tibble::add_column(!!!cols[!names(cols) %in% names(.)])

# A tibble: 5 x 4
      x     y     z     w
  <int> <dbl> <dbl> <dbl>
1     1     1     2    NA
2     2     1     2    NA
3     3     1     2    NA
4     4     1     2    NA
5     5     1     2    NA

I tried the following to replicate it using purrr (I'd prefer not to use a for loop):

dt_list_2 = dt_list %>% 
  purrr::map(
    ~dplyr::select(., -starts_with("x")) %>% 
      ~tibble::add_column(!!!cols[!names(cols) %in% names(.)])
  )

But the output is not the same as doing it separately.

Note: This is an example of my real problem. In fact, I'm using purrr to read many *.csv files and then apply some data transformation. Something like this:

re_file <- list.files(path = dir_path, pattern = "*.csv")

cols_add = c(UCI = NA_real_)

file_list = re_file %>%
  purrr::map(function(file_name){ # iterate through each file name
    
    read_csv(file = paste0(dir_path, "//",file_name), skip = 2)
  }) %>% 
   purrr::map(
     ~dplyr::select(., -starts_with("Textbox")) %>% 
       ~dplyr::tibble(!!!cols[!names(cols) %in% names(.)])
  )
Cristhian
  • 361
  • 3
  • 12
  • 1
    `purrr::map(dt_list, function(x) {x$w <- x$w %||% NA; x})` where `%||%` comes from one of those tidy packages – rawr Feb 17 '21 at 02:21

1 Answers1

2

You can use :

dt_list %>% 
  purrr::map(
    ~tibble::add_column(., !!!cols[!names(cols) %in% names(.)])
  )

#[[1]]
# A tibble: 5 x 4
#     x     y     z     w
#  <int> <dbl> <dbl> <dbl>
#1     1     1     2    NA
#2     2     1     2    NA
#3     3     1     2    NA
#4     4     1     2    NA
#5     5     1     2    NA

#[[2]]
# A tibble: 5 x 4
#      x     y     z     w
#  <int> <dbl> <dbl> <dbl>
#1     5     3     3     7
#2     4     3     3     7
#3     3     3     3     7
#4     2     3     3     7
#5     1     3     3     7
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Yesterday I found out that missing the dot "." as the first argument in ```add_column()``` was the problem (as I was doing with ```dply::select()```); as you suggest in your answer (By the way, thank you so much). But I still do not know why the dot (.) was needed? When using the ```purrr:map()```, shouldn't the code already know that it is working with a dataframe? – Cristhian Feb 17 '21 at 16:12
  • 1
    `?tibble::add_column` needs a dataframe as first argument and you need to pass it explicitly. With `map` you can either use `.` or `.x`. Both of them will work the same. – Ronak Shah Feb 17 '21 at 23:59