I am trying to subset a dataframe from a huge dataframe, then group this subset into multiple small dataframes by one column. However, the split()
function returns information not from the small dataframe, but its parent dataframe.
I can reproduce it using the iris
dataset:
data("iris")
# Subset from the huge dataframe
sub_df = iris[grep("virginica", iris$Species), ]
# Group them by the `Species` column
split_list = split(sub_df, sub_df$Species)
length(split_list) # It gives 3
The split_list
looks like:
$setosa
[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<0 rows> (or 0-length row.names)
$versicolor
[1] Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<0 rows> (or 0-length row.names)
$virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
...
Why does split()
generate setosa
and versicolor
dataframes?