Abstract. I am having trouble understanding a unit of code regarding the sub-setting of lists. I am applying an index to a list. The problem is that when I apply the index to a list inside a custom function, the list behaves like a table, returning only the first column, but for every row (4 rows in total). If I apply the same index to the same list outside of that custom function, the output is only the first element of the list, displaying both elements of the character vector contained in the first element of the list. I need to know why there is a difference in outputs.
How have I tried to resolve my issue by myself? I performed a Google search on the following search term: [Indexing Lists in R](Indexing Lists https://stackoverflow.com/questions/tagged/r). The closest article was this one: How to correctly use lists in R. But, it failed to answer my question.
Introduction. I am citing the code that I am using before stating my question because it is too confusing a matter to explain in the absolute abstract.
In the below, there are four instructions that students are told to follow. Each one is enumerated.
# Instruction 1:
# Create a character vector containing the names of the top four
# mathematicians that contributed to the field of statistics and
# list their birth years, with the name and year separated by a
# colon.
mathematicians <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
# The above code creates a character vector with four elements.
# Instruction 2: Next, use the strsplit() function to split the person's
# last name from his birth year.
split_name_and_year_born <- strsplit(mathematicians, split = ":")
# The variable split_name_and_year_born must be a list because
# strsplit only returns lists (according to the documentation).
# Instruction 3: Write a function that accepts a list or vector
# object and returns only the first element of that object.
first <- function(x) {
x[1]
}
# This is a fairly straightforward function. If x is a list then
# x[1] should be the first element of that list. The same is true
# for vectors.
# Instruction 4: apply the first function to the list split_name_and_year_born
lapply(split_name_and_year_born, first)
# [[1]]
# [1] "GAUSS"
#
# [[2]]
# [1] "BAYES"
#
# [[3]]
# [1] "PASCAL"
#
# [[4]]
# [1] "PEARSON"
My commentary: If you consider split_name_and_year_born
as a list of vectors, of length = 2, we could imagine the list behaving somewhat like a table, wherein the first element is the first column in the table. This interpretation of the above code makes sense given the output. However, if I enter the following line of code, I get only the first element of the list.
split_name_and_year_born[1]
[[1]] [1] "GAUSS" "1777"
My question is, why is there a difference in the output? I am using the same data structure, with the same data. I am only applying the indexing operator in different places. Why is there a difference in outputs? The function must be doing something implicit. I just do not know what.