0

I am trying to subset a dataframe from a huge dataframe, then group this subset into multiple small dataframes by one column. However, the split() function returns information not from the small dataframe, but its parent dataframe.

I can reproduce it using the iris dataset:

data("iris")
# Subset from the huge dataframe
sub_df = iris[grep("virginica", iris$Species), ]

# Group them by the `Species` column
split_list = split(sub_df, sub_df$Species)
length(split_list)        # It gives 3

The split_list looks like:

$setosa
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 rows> (or 0-length row.names)

$versicolor
[1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
<0 rows> (or 0-length row.names)

$virginica
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
101          6.3         3.3          6.0         2.5 virginica
102          5.8         2.7          5.1         1.9 virginica
...

Why does split() generate setosa and versicolor dataframes?

Jay Wang
  • 2,650
  • 4
  • 25
  • 51

0 Answers0