0

Hi I have animal abundance data collected from quadrats with 4 quadrats per station.

In the field, some quadrats were missed.

ex

St/ Q /Anim1 abundance /Anim 2 abundance/....etc
1 /1 /
1 /2 /
1 /3 /
1 /4 /
2 /1 /
2 /2 /
2 /4 /
3 /1 /
3 /2 /
3 /3 /
3 /4 /

Station 2 is missing quadrat 3. I would like to remove all rows (including animal abundance data) associated with station 2 from further analysis. I would like to do this in a function as I have multiple large csv files I need to clean up.

I tried subset and for loops but struggling with both

Thank you for your time

******update I'm working with this qc_Large29 <- Large29[Large29[, 5]>=4,]

which gives me all the 4th quadrats from each station. Is there a way to add a length() to it so that the new dataframe will only be the data associated with stations that have 4 quadrats?

**********update

 dput(Large29[1:30,1:5])
structure(list(FID = 652:681, areaContro = c(29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L
), areaShortN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "CAIIN", class = "factor"), station = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L), quadrat = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L)), .Names = c("FID", 
"areaContro", "areaShortN", "station", "quadrat"), row.names = c(NA, 
30L), class = "data.frame")
> 
lauren
  • 3
  • 3
  • it would be helpful if you could post a minimal example of your dataframes, e.g. using `dput( your.data )`. – David Heckmann Mar 07 '17 at 18:22
  • It's too long by over 60k characters, but here's a couple excepts.. what are you interested in, maybe I can answer? ("FID", "areaContro", "areaShortN", "station", "quadrat", "latitude", "longitude", "depthFatho", "surveyDTTM", "updatedPK", "surveyRawD", "cameraCont", "imageExist", "isImageOfI", "sand", "sandRipple", "shellDebri", "silt", "gravel", "scallops", "clappers", "seed", "seaStars", "crabs", "hermitCrab", "echinoderm", "lobster", "sandDollar", "ad", "anemone", "bHydra", – lauren Mar 07 '17 at 18:36
  • 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0), herring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, – lauren Mar 07 '17 at 18:38
  • subset your data so that it still contains examples of the rows you want to remove, e.g. by `dput(yourdata[1:30,])` . – David Heckmann Mar 07 '17 at 18:39
  • My dataset is still too big to fit here even with 1:2 – lauren Mar 07 '17 at 18:48
  • The problem is that I was unclear with my dataset description? – lauren Mar 07 '17 at 18:50
  • The answer will depend on the class of yor data. if this (http://stackoverflow.com/questions/8005154/conditionally-remove-dataframe-rows-with-r?rq=1) doesn't help, you will need to create a minimal example (`dput(yourdata[1:30,1:5])`). – David Heckmann Mar 07 '17 at 18:57
  • Updated my question, still too long to fit in comments – lauren Mar 07 '17 at 19:07

1 Answers1

0

This selects everything but the "2" stations:

Large29[Large29$station!=2,]

for you second question (the edit), I would suggest to use dplyr, where you can group by stations:

library(dplyr)
Large29 %>% group_by(station) %>% filter(n()>=4) %>% as.data.frame()
David Heckmann
  • 2,899
  • 2
  • 20
  • 29