I'm trying to write a function that will take a list object (in a specific format) and return a dataframe. When doing so, I have two criteria that are somewhat conflicting:
- The returned dataframe should have column names that are indicative to what each column is about (that is, not
col_a
,col_b
, etc.). - The function should accept a list object that could hold various kinds of data (e.g., weight/height/age/mood/country, etc.)
Those criteria contradict because the function can't know how to name the data columns if it doesn't have prior information. My solution is to semi-automate the function by including an argument that tells what its purpose is when the function is being called.
Therefore, because the function knows its "purpose" when being executed, it knows that one column is going to include (for example) age, another column will include country, etc.
However, how could I be sure that the right columns are being named with the appropriate names, and there is no mismatching (e.g., "age" header is assigned to the weight column)?
I'm trying to work this problem out with tidyverse
functions.
1 -- Data object to convert
vec <- c(1, 2, 3)
names(vec) <- c("A", "B", "C")
my_data_object_as_list <- as.list(vec)
my_data_object_as_list
## $A
## [1] 1
## $B
## [1] 2
## $C
## [1] 3
2 -- My custom function for converting
require(tidyr)
require(dplyr)
require(tidyselect)
organize_in_table <-
function(as_list_object,
purpose = NULL) {
table <- as_list_object %>%
bind_rows() %>%
pivot_longer(cols = tidyselect::everything())
if (is.null(purpose)) {
return(table)
} else if (purpose == "match_letters_and_numbers") {
table <- rename(table, letters = name, numbers = value)
}
return(table)
}
EDIT
From @akrun's comment I've learned that I could equivalently use:
library(tibble)
organize_in_table <-
function(as_list_object,
purpose = NULL) {
table <- as_list_object %>%
enframe() %>%
tidyr::unnest(c(value))
if (is.null(purpose)) {
return(table)
} else if (purpose == "match_letters_and_numbers") {
table <- rename(table, letters = name, numbers = value)
}
return(table)
}
3 -- Example for using the function
df_letters_and_numbers <-
organize_in_table(my_data_object_as_list, "match_letters_and_numbers")
> df_letters_and_numbers
## # A tibble: 3 x 2
## letters numbers
## <chr> <dbl>
## 1 A 1
## 2 B 2
## 3 C 3
4 -- Demonstration of potential problem
Data to be converted
vec_2 <- c("A", "B", "C")
names(vec_2) <- c(1, 2, 3)
my_data_object_as_list_2 <- as.list(vec_2)
> my_data_object_as_list_2
## $`1`
## [1] "A"
## $`2`
## [1] "B"
## $`3`
## [1] "C"
Conversion ends up with mismatching column names
organize_in_table(my_data_object_as_list_2, "match_letters_and_numbers")
## # A tibble: 3 x 2
## letters numbers
## <chr> <chr>
## 1 1 A
## 2 2 B
## 3 3 C
The key point to keep in mind is that this function should potentially accept any kind of data (e.g., age, weight, distance, dominant personality trait, name, driver license ID, etc.). The user executing the function is responsible to detail the properties of the variable being included.
Below are two examples for types of data that need certain validation before renaming. Provided with purpose
argument, organize_in_table()
should know which "validating functions" are relevant to refer to before returning the column-named dataframe.
Example #1 -- Matching Greek words and equivalent words in English
Data
vec_greek <- c("σκύλος", "Γάτα", "ζέβρα")
names(vec_greek) <- c("dog", "cat", "zebra")
data_object_greek_english <- as.list(vec_greek)
data_object_greek_english
## $dog
## [1] "sκύλος"
## $cat
## [1] "Gάta"
## $zebra
## [1] "ζέßρa"
Validating functions
grepl("[\u0370-\u03ff\u1f00-\u1fff]+", x)
library(stringi)
stri_enc_isascii()
Desired Output
## regardless of whether data object is "data_object_greek_english_1":
vec_greek <- c("σκύλος", "Γάτα", "ζέβρα")
names(vec_greek) <- c("dog", "cat", "zebra")
data_object_greek_english_1 <- as.list(vec_greek)
## or "data_object_greek_english_2":
vec_english <- c("dog", "cat", "zebra")
names(vec_english) <- c("σκύλος", "Γάτα", "ζέβρα")
data_object_greek_english_2 <- as.list(vec_english)
## the call:
organize_in_table(data_object_greek_english_1, purpose = "match_greek_and_english")
## should return the same output as:
organize_in_table(data_object_greek_english_2, purpose = "match_greek_and_english")
## # A tibble: 3 x 2
## english greek ## position of columns doesn't matter as long as headers are appropriate to values
## <chr> <chr>
## 1 dog sκύλος
## 2 cat Gάta
## 3 zebra ζέßρa
Example #2 -- Matching phone numbers and California driver license ID
Data
Data below is absolutely made up
vec_driver_license <- c("F2849563", "I2938461", "B2293890")
names(vec_driver_license) <- c("626-710-9060", "831-263-9154", "510-923-6869")
data_object_phone_dl <- as.list(vec_driver_license)
data_object_phone_dl
## $`626-710-9060`
## [1] "F2849563"
## $`831-263-9154`
## [1] "I2938461"
## $`510-923-6869`
## [1] "B2293890"
Validating functions
grepl("^\\s*(\\+\\s*1(-?|\\s+))*[0-9]{3}\\s*-?\\s*[0-9]{3}\\s*-?\\s*[0-9]{4}$", x)
grepl("^[A-Z]{1}\\d{7}$", x)
Desired Output
## regardless of whether data object is "data_object_phone_dl_1":
vec_driver_license <- c("F2849563", "I2938461", "B2293890")
names(vec_driver_license) <- c("626-710-9060", "831-263-9154", "510-923-6869")
data_object_phone_dl_1 <- as.list(vec_driver_license)
## or "data_object_phone_dl_2":
vec_phone_number <- c("626-710-9060", "831-263-9154", "510-923-6869")
names(vec_phone_number) <- c("F2849563", "I2938461", "B2293890")
data_object_phone_dl_2 <- as.list(vec_phone_number)
## the call:
organize_in_table(data_object_phone_dl_1, purpose = "match_phone_and_dl")
## should return the same output as:
organize_in_table(data_object_phone_dl_2, purpose = "match_phone_and_dl")
## # A tibble: 3 x 2
## phone_number driver_license_id ## position of columns doesn't matter as long as headers are appropriate to values
## <chr> <chr>
## 1 626-710-9060 F2849563
## 2 831-263-9154 I2938461
## 3 510-923-6869 B2293890