1

I am faced with a slightly complicated problem. I have a list, whose elements are dataframes. Within these dataframes are guaranteed to be only one of two specifically named columns, that contain a POSIXct-variable. I need to perform a conditional mutate on these columns, to format them from POSIXct to '%d.%m.%Y'. Please consider this example:

library(lubridate)
rm(list = ls())

data <- list(a = tibble(ServiceStart = now(), y = 1),
             b = tibble(y = 1, BillDate = now()))

format(now(), format='%d.%m.%Y')

This may seem fairly straightforward, but I would like this to work automatically, e.g. I would like a conditional command that recognizes when to mutate the variable ServiceStart and when to mutate the variable BillDate. I have so far come up with this:

if("ServiceStart" %in% names(data[[1]]))
{
  data %>%
    map(~mutate(., ServiceStart = format(data[[1]]$ServiceStart, format='%d.%m.%Y')))
} else {
  data %>%
    map(~mutate(., BillDate = format(data[[1]]$BillDate, format='%d.%m.%Y')))
}

This obviously does not work, because data[[1]] tells R to only target the first dataframe of the list, when I want it to do it for every element of the list individually.

I believe it is important to give some further context as to why I'm doing this: In order to simplify / speed up the data cleaning process I have written a short script that imports all Excel files in a given folder as elements of a list. I then use map() to remove all non-alphanumeric characters from a column. I finish this process by using lapply() to export every element of the list (all dataframes) as cleaned Excel files. This works as intended:

library(tidyverse)
library(readxl)
library(writexl)
rm(list = ls())

dir.create(file.path("data", "cleaned")) #create "cleaned" folder if needed

filenames <- list.files("data", pattern="*.xlsx", full.names = TRUE) #extract file names

data_raw <- lapply(filenames, read_xlsx) #import all files as elements of list

data_edit <- data_raw %>%  #clean column from non-alphanumeric characters (main purpose!)
  map(~mutate(., InitiatorZSR = str_replace_all(InitiatorZSR, "[^[:alnum:]]", "")))

names(data_edit) <- substr(filenames, 6, 7) #name dataframes within list (usually a number)

files <- list.files("data", pattern="*.xlsx", full.names = F)
filenames <- paste("data/cleaned/", files, sep = "") #prepare export to "cleaned" folder

lapply(seq_along(data_edit), function(i){
  write_xlsx(data_edit[[i]], filenames[i])
}) #export all elements of list as excel files to folder "cleaned"

The problem is that the date columns, which are in '%d.%m.%Y' format in the import files, are unfortunately parsed to POSIXct when importing with readxl, which I can not disable. This makes the files unusable for me. Because this script was written to work automatically with any generic import file, I can not target and mutate these columns manually, as that would defeat the purpose.

I was inspired by this post Opening all files in a folder, and applying a function to try this method.

I realize that this is a very specific and somewhat complicated problem, but would immensely appreciate any help if somebody knew how to proceed.

tc_data
  • 113
  • 2
  • 8

2 Answers2

3

You are looking for rapply

rapply(data,format,"POSIXct",how = "replace",format="%d---%m----%Y")#You can use any fromating you want
$a
# A tibble: 1 x 2
  ServiceStart        y
  <chr>           <dbl>
1 04---05----2018    1.

$b
# A tibble: 1 x 2
      y BillDate       
  <dbl> <chr>          
1    1. 04---05----2018
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • short, elegant, effective. this is exactly what I was looking for (except for the format, which should be '%d.%m.%Y'). thank you! – tc_data May 04 '18 at 10:28
2

I hope I have understood your question correctly and this will help you.

You can turn your conditional into a function then use it on your list of dataframes with lapply, you shouldn't need to use map or mutate to do this

library(lubridate)

data <- list(a = tibble(ServiceStart = now(), y = 1),
         b = tibble(y = 1, BillDate = now()))

format_col <- function(x) {
    if("ServiceStart" %in% names(x))
    {
        x$ServiceStart <- format(x$ServiceStart, format='%d.%m.%Y')
        x
    } else {
        x$BillDate <- format(x$BillDate, format='%d.%m.%Y')
        x
    }
}

data <- lapply(data, format_col)

data

# $a
# # A tibble: 1 x 2
#   ServiceStart     y
#   <chr>        <dbl>
# 1 04.05.2018    1.00

# $b
# # A tibble: 1 x 2
#   y BillDate  
#   <dbl> <chr>     
# 1  1.00 04.05.2018
Relasta
  • 1,066
  • 8
  • 8