Easy data tidying hacks for codes like these?

Question

I have been repeating these lines of code consistently for smaller data from larger sets of data. My time is effectively taken up trying to reproduce these codes, are there more efficient ways of doing this without such a long process? Also, what are some general rules to keep in mind to avoid repeating this?

Here's an example of many lines of code I use frequently to extract data I need from larger data-sets. It was as simple as extracting bird species from a larger data-set, placing them into a vector length that matches years and then reproducing means of the population index over these years for all species.

Are there 'hacks' as you may call it, that allow for such a scenario and those similar to be represented in smaller lines of codes, by using more efficient functions? can any examples be given?

dat <- read.csv("Bird_Dataset_2019.csv")
  L_starling <- dat[dat$Species=="Starling",]
  L_skylark <- dat[dat$Species=="Skylark",]
  L_yellow_wagtail <- dat[dat$Species=="Yellow Wagtail",]
  L_kestrel <- dat[dat$Species=="Kestrel",]
  L_yellowhammer <- dat[dat$Species=="Yellowhammer",]
  L_greenfinch <- dat[dat$Species=="Greenfinch",]
  L_swallow <- dat[dat$Species=="Swallow",]
  L_lapwing <- dat[dat$Species=="Lapwing",]
  L_housemartin <- dat[dat$Species=="House Martin",]
  L_linnet <- dat[dat$Species=="Linnet",]
  L_greypartridge <- dat[dat$Species=="Grey Partridge",]
  L_turteldove <- dat[dat$Species=="Turtle Dove",]
  L_cornbunting <- dat[dat$Species=="Corn Bunting",]
  L_bullfinch <- dat[dat$Species=="Bullfinch",]
  L_songthrush <- dat[dat$Species=="Song Thrush",]
  L_blackbird <- dat[dat$Species=="Blackbird",]
  L_dunnock <- dat[dat$Species=="Dunnock",]
  #code for population size and farmland occupants in years--------
  years <- 1994:2013
  starling_means <- vector(length = length(years))
  skylark_means <- vector(length = length(years))
  yellow_wagtail_means <- vector(length = length(years))
  kestrel_means <- vector(length = length(years))
  yellowhammer_means <- vector(length = length(years))
  greenfinch_means <- vector(length = length(years))
  swallow_means <- vector(length = length(years))
  lapwing_means <- vector(length = length(years))
  housemartin_means <- vector(length = length(years))
  linnet_means <- vector(length = length(years))
  greypartridge_means <- vector(length = length(years))
  turtledove_means <- vector(length = length(years))
  cornbunting_means <- vector(length = length(years))
  bullfinch_means <- vector(length = length(years))
  songthrush_means <- vector(length = length(years))
  blackbird_means <- vector(length = length(years))
  dunnock_means <- vector(length = length(years))
  #---------------------------------------------------
   for(i in 1:length(years)){
    starling_means[i] <- mean(L_starling$Pop_Index[L_starling$Year==years[i]], na.rm = TRUE)
    skylark_means[i] <- mean(L_skylark$Pop_Index[L_skylark$Year==years[i]], na.rm = TRUE)
    yellow_wagtail_means[i] <- mean(L_yellow_wagtail$Pop_Index[L_yellow_wagtail$Year==years[i]], na.rm = TRUE)
    kestrel_means[i] <- mean(L_kestrel$Pop_Index[L_kestrel$Year==years[i]], na.rm = TRUE)
    yellowhammer_means[i] <- mean(L_yellowhammer$Pop_Index[L_yellowhammer$Year==years[i]], na.rm = TRUE)
    greenfinch_means[i] <- mean(L_greenfinch$Pop_Index[L_greenfinch$Year==years[i]], na.rm = TRUE)
    swallow_means[i] <- mean(L_swallow$Pop_Index[L_swallow$Year==years[i]], na.rum = TRUE)
    lapwing_means[i] <- mean(L_lapwing$Pop_Index[L_lapwing$Year==years[i]], na.rm = TRUE)
    housemartin_means[i] <- mean(L_housemartin$Pop_Index[L_housemartin$Year==years[i]], na.rm = TRUE)
    linnet_means[i] <- mean(L_linnet$Pop_Index[L_linnet$Year==years[i]], na.rm = TRUE)
    greypartridge_means[i] <- mean(L_greypartridge$Pop_Index[L_greypartridge$Year==years[i]], na.rm = TRUE)
    turtledove_means[i] <- mean(L_turteldove$Pop_Index[L_turteldove$Year==years[i]], na.rm = TRUE)
    cornbunting_means[i] <- mean(L_cornbunting$Pop_Index[L_cornbunting$Year==years[i]], na.rm = TRUE)
    bullfinch_means[i] <- mean(L_bullfinch$Pop_Index[L_bullfinch$Year==years[i]], na.rm = TRUE)
    songthrush_means[i] <- mean(L_songthrush$Pop_Index[L_songthrush$Year==years[i]], na.rm = TRUE)
    blackbird_means[i] <- mean(L_blackbird$Pop_Index[L_blackbird$Year==years[i]], na.rm = TRUE)
    dunnock_means[i] <- mean(L_dunnock$Pop_Index[L_dunnock$Year==years[i]], na.rm = TRUE)
  }

  # All means placed into a data.frame-----------------------------
  L_population_frame <- data.frame(years, log(starling_means), log(skylark_means), yellow_wagtail_means, kestrel_means, yellowhammer_means, log(greenfinch_means), log(swallow_means), lapwing_means, housemartin_means, linnet_means, greypartridge_means, turtledove_means, cornbunting_means, bullfinch_means, songthrush_means, log(blackbird_means), dunnock_means)
 colnames(L_population_frame) <- c("Years", "Starling", "Skylark", "YellowWagtail", "Kestrel", "Yellowhammer", "Greenfinch", "Swallow", "Lapwing", "Housemartin", "Linnet", "GreyPartridge", "TurtleDove", "Cornbunting", "Bullfinch", "Songthrush", "Blackbird", "Dunnock")

Probably you are trying to calculate mean by group. Try `aggregate(Pop_Index~ Species + Year, dat, mean)` — Ronak Shah, Nov 24 '19 at 14:20
You can use functional programming tools to avoid repetition. For exampple, the task of extracting data would be coded like this for `mtcars`: `car_list <- lapply(car_names, function(x) mtcars[rownames(mtcars) == x,])`. — slava-kohut, Nov 24 '19 at 14:22
@RonakShah Can this be used to select a specific species such that Species$Blackbird? — Lime, Nov 24 '19 at 14:28
yes, subset it first `aggregate(Pop_Index~ Species + Year, dat[dat$Species == "Blackbird", ], mean)` — Ronak Shah, Nov 24 '19 at 14:33
A rule of thumb: as soon as you have the same code more than once and just change object/variable names, you should/can wrap it into a function. — mnist, Nov 24 '19 at 14:39
To add to Ronak Shah, there's also a subset argument in ```aggregate(Pop_Index~ Species + Year, data = dat, subset = Species == 'Blackbird', mean)``` which is a different method of doing what he did — Cole, Nov 24 '19 at 14:50

score 0 · Accepted Answer · answered Nov 27 '19 at 00:42

0

As mentioned in comments you can reduce your code by using aggregate

aggregate(Pop_Index~ Species + Year, dat, mean)

to get mean Pop_Index for each Year and Species.

There are various way to solve it as shown in Mean per group in a data.frame

To get mean for specific subset we can do

aggregate(Pop_Index~ Species + Year, dat[dat$Species == "Blackbird", ], mean)

answered Nov 27 '19 at 00:42

Ronak Shah

377,200
20
156
213

I have found that replacing ```==``` with the logical operator ```%in%``` allows for more objects when combined with ```c()``` .I experienced the problem that ```==``` only allowed up to three objects when combined with ```c()```. Just something I learnt. – Lime Nov 27 '19 at 15:00

Easy data tidying hacks for codes like these?

1 Answers1