I am faced with a slightly complicated problem. I have a list, whose elements are dataframes. Within these dataframes are guaranteed to be only one of two specifically named columns, that contain a POSIXct-variable. I need to perform a conditional mutate
on these columns, to format them from POSIXct to '%d.%m.%Y'
. Please consider this example:
library(lubridate)
rm(list = ls())
data <- list(a = tibble(ServiceStart = now(), y = 1),
b = tibble(y = 1, BillDate = now()))
format(now(), format='%d.%m.%Y')
This may seem fairly straightforward, but I would like this to work automatically, e.g. I would like a conditional command that recognizes when to mutate the variable ServiceStart and when to mutate the variable BillDate. I have so far come up with this:
if("ServiceStart" %in% names(data[[1]]))
{
data %>%
map(~mutate(., ServiceStart = format(data[[1]]$ServiceStart, format='%d.%m.%Y')))
} else {
data %>%
map(~mutate(., BillDate = format(data[[1]]$BillDate, format='%d.%m.%Y')))
}
This obviously does not work, because data[[1]]
tells R to only target the first dataframe of the list, when I want it to do it for every element of the list individually.
I believe it is important to give some further context as to why I'm doing this:
In order to simplify / speed up the data cleaning process I have written a short script that imports all Excel files in a given folder as elements of a list. I then use map()
to remove all non-alphanumeric characters from a column. I finish this process by using lapply()
to export every element of the list (all dataframes) as cleaned Excel files. This works as intended:
library(tidyverse)
library(readxl)
library(writexl)
rm(list = ls())
dir.create(file.path("data", "cleaned")) #create "cleaned" folder if needed
filenames <- list.files("data", pattern="*.xlsx", full.names = TRUE) #extract file names
data_raw <- lapply(filenames, read_xlsx) #import all files as elements of list
data_edit <- data_raw %>% #clean column from non-alphanumeric characters (main purpose!)
map(~mutate(., InitiatorZSR = str_replace_all(InitiatorZSR, "[^[:alnum:]]", "")))
names(data_edit) <- substr(filenames, 6, 7) #name dataframes within list (usually a number)
files <- list.files("data", pattern="*.xlsx", full.names = F)
filenames <- paste("data/cleaned/", files, sep = "") #prepare export to "cleaned" folder
lapply(seq_along(data_edit), function(i){
write_xlsx(data_edit[[i]], filenames[i])
}) #export all elements of list as excel files to folder "cleaned"
The problem is that the date columns, which are in '%d.%m.%Y'
format in the import files, are unfortunately parsed to POSIXct when importing with readxl, which I can not disable. This makes the files unusable for me.
Because this script was written to work automatically with any generic import file, I can not target and mutate these columns manually, as that would defeat the purpose.
I was inspired by this post Opening all files in a folder, and applying a function to try this method.
I realize that this is a very specific and somewhat complicated problem, but would immensely appreciate any help if somebody knew how to proceed.