0

I'm very new to R so excuse any incorrect language. I'm not sure if I even asked this question correctly, but here is the problem I'm dealing with.

Suppose I have a data frame that contains data for lengths and weights for 10 different species of fish. Suppose I have 100 samples for each species a fish (1000 rows of data). Is it possible to return the describe() function of a column for each unique species of fish without having to create an object for each species?

For example if I write:

Catfish <- filter(dataframe, dataframe$lengths == "Catfish")

describe(Catfish$lengths)

Do I have to manually create an object (Catfish for example) for each species and then describe? Or is there a simpler way to return describe() for the lengths of each unique species directly from my original dataframe? Hopefully I asked the clearly enough. Thanks for any help!

MLavoie
  • 9,671
  • 41
  • 36
  • 56
  • Welcome to SO! Have a look [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) how to make a great reproducible example. For your case, you might want to have a look at how to organise data in a [tidy way](https://r4ds.had.co.nz/tidy-data.html), how to use `dplyr` to manipulate data, and from base R `lapply` how to apply a function to several entries of a list, in your case you could use all `unique` fish names to filter for in the function you use with `lapply` – starja Jul 16 '20 at 16:14
  • What @starja says. `group_by` may also be helpful. Where does `describe()` come from? – Limey Jul 16 '20 at 16:15
  • @Limey I suspect that `describe` prints some output, so it could be a bit difficult to use it with `group_by`, otherwise a great tip – starja Jul 16 '20 at 16:18
  • @starja: I agree. You're probably right, but until we get a MWE, we won't know for sure... ;) – Limey Jul 16 '20 at 16:20
  • You could also use the `summary` function in base R or the `summarize` function from Dplyr – Daniel_j_iii Jul 16 '20 at 16:42

1 Answers1

0

I think what you might want to look into is a split-apply-combine technique (example below)

df
  value ID
1     1 ID
2     2 ID
3     3 PD
4     4 PD
5     5 ID
#split by grouping variable (in your case a fishspecies)
df_split <- split(df, df$ID)

#apply a function (in your case describe)
df_split <- lapply(df_split, function(x) { x["ID"] <- NULL; x }) #removed ID for easier merging
df_split <- lapply(df_split, describe)

#combine 
Result <- Reduce(rbind, df_split)
Result

    vars n mean   sd median trimmed  mad min max range skew kurtosis  se
X1     1 3 2.67 2.08    2.0    2.67 1.48   1   5     4 0.29    -2.33 1.2
X11    1 2 3.50 0.71    3.5    3.50 0.74   3   4     1 0.00    -2.75 0.5

What would improve this script is to add the specific grouping variable to each row (so "ID" in this example). But I think this provides a starting point for you.

maarvd
  • 1,254
  • 1
  • 4
  • 14