0

I am trying to create a dataframe from an existing dataframe, retaining only specific columns, for a specified column value (a species in my data). Essentially I intend on creating separate dataframes for each species in my dataset, detailing the stations they were landed at, and retaining the raising agent for the hauls (RF.haul).

My reprex is

QSC <- with(Dataframe[Dataframe$Species=="QSC", ], aggregate(number=RF.haul), by(Station=Station), FUN = sum, na.rm= TRUE)

I get various errors, mostly "object not found" RE the column headings in the code, and I'm sure this is a relatively easy thing to do in R - I just can't get my head around it (I'm new to R!)

  • 1
    Hi! I'm sorry but that's not a reprex. A reproducible example should be reproducible. But without Dataframe I can't run your code. – Edo Aug 28 '20 at 15:46

1 Answers1

0

Assuming RF.haul is a numeric column, consider formula style in aggregate using data argument and cbind to rename numeric column. Note: na.action argument added to handle missings with aggregate formula style. See @Rorschach's answer here.

QSC <- aggregate(cbind(number = RF.haul) ~ Station, 
                 data = Dataframe[Dataframe$Species=="QSC", ], 
                 FUN=function(x) sum(x, na.rm=TRUE), 
                 na.action = na.pass)

And to build a list of data frames by Species, add this to the grouping terms with Station and then run split for named list of data frames. Using a single list is arguably a better choice than many, separate data frames in global environment. See @GregorThomas' canonical answer.

agg_df <- aggregate(cbind(number = RF.haul) ~ Species + Station, 
                    data = Dataframe, 
                    FUN=function(x) sum(x, na.rm=TRUE), 
                    na.action = na.pass)

species_agg_dfs <- split(agg_df, agg_df$Species)

species_agg_dfs$QS
...
Parfait
  • 104,375
  • 17
  • 94
  • 125