Understanding List Sub-setting in R

Question

Abstract. I am having trouble understanding a unit of code regarding the sub-setting of lists. I am applying an index to a list. The problem is that when I apply the index to a list inside a custom function, the list behaves like a table, returning only the first column, but for every row (4 rows in total). If I apply the same index to the same list outside of that custom function, the output is only the first element of the list, displaying both elements of the character vector contained in the first element of the list. I need to know why there is a difference in outputs.

How have I tried to resolve my issue by myself? I performed a Google search on the following search term: [Indexing Lists in R](Indexing Lists https://stackoverflow.com/questions/tagged/r). The closest article was this one: How to correctly use lists in R. But, it failed to answer my question.

Introduction. I am citing the code that I am using before stating my question because it is too confusing a matter to explain in the absolute abstract.

In the below, there are four instructions that students are told to follow. Each one is enumerated.

# Instruction 1:
# Create a character vector containing the names of the top four 
# mathematicians that contributed to the field of statistics and
# list their birth years, with the name and year separated by a
# colon.

mathematicians <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
# The above code creates a character vector with four elements.

# Instruction 2: Next, use the strsplit() function to split the person's
# last name from his birth year.

split_name_and_year_born <- strsplit(mathematicians, split = ":")
# The variable split_name_and_year_born must be a list because
# strsplit only returns lists (according to the documentation).

# Instruction 3: Write a function that accepts a list or vector
# object and returns only the first element of that object.

first <- function(x) {
   x[1]
}
# This is a fairly straightforward function. If x is a list then
# x[1] should be the first element of that list. The same is true
# for vectors.

# Instruction 4: apply the first function to the list split_name_and_year_born
lapply(split_name_and_year_born, first)
# [[1]]
# [1] "GAUSS"
# 
# [[2]]
# [1] "BAYES"
# 
# [[3]]
# [1] "PASCAL"
# 
# [[4]]
# [1] "PEARSON"

My commentary: If you consider split_name_and_year_born as a list of vectors, of length = 2, we could imagine the list behaving somewhat like a table, wherein the first element is the first column in the table. This interpretation of the above code makes sense given the output. However, if I enter the following line of code, I get only the first element of the list.

split_name_and_year_born[1]

[[1]] [1] "GAUSS" "1777"

My question is, why is there a difference in the output? I am using the same data structure, with the same data. I am only applying the indexing operator in different places. Why is there a difference in outputs? The function must be doing something implicit. I just do not know what.

"This is a fairly straightforward function. If x is a list then x[1] should be the first element of that list." Nope! ;) For lists, `[[` selects elements, and `[` slices _sub-lists_. So `x[1]` will get you not the first element of x, but a list of length one, who's only element is the first element of x. — joran, Jun 20 '19 at 21:21
The reason you get different results in `lapply` is that what it being passed to `first` is each individual element, taken _out_ of the list. In that case, it is now a simple vector, and `[` works as expected. — joran, Jun 20 '19 at 21:22
(By the way, there's lots of stuff in that linked question that doesn't apply to your specific problem, but the 3rd bullet in the question and JD Long's answer directly address the specific problem you're having.) — joran, Jun 20 '19 at 21:27
Think of it like this: `lapply` "breaks" a list into its components, and then applies the function to each individual part. So your function is getting the first element of such part, which takes a vector of length two (name and year) and, as expected, returns the first element (name). If you define your function as `function(x) x[1:2]` you'll find the expected behavior. On the other hand, your `split_name_and_year[1]` is calling the first element of the list, which is a length 2 vector — PavoDive, Jun 20 '19 at 21:36
@PavoDive I tried adding a test function first2 <- function(x) { x[1:2] } and then applied first2 to split_name_and_year. The result was the same as the result I got just by typing split_name_and_year into the Console. — Dr. Donald Tynes II PE PhD, Jun 20 '19 at 21:44
Were you expecting `lapply(split_name_and_year_born, first)` to return the same thing as `split_name_and_year_born[1]`? — joran, Jun 20 '19 at 21:51
@PavoDive - what I think you two are saying is that lapply() goes through each element of the list - each is a character vector - and serves up each character vector to the function *first*, which in turn selects the first element in each of the four character vectors. Which would explain my output. — Dr. Donald Tynes II PE PhD, Jun 20 '19 at 21:58
Yes, then that's where you're confused. `lapply` run the function on each individual element of the list in turn, not on the the list as a whole. — joran, Jun 20 '19 at 22:01

Understanding List Sub-setting in R

0 Answers0