Making simple function to clean up data (removing rows conditionally without NA)

Question

Hi I have animal abundance data collected from quadrats with 4 quadrats per station.

In the field, some quadrats were missed.

ex

St/ Q /Anim1 abundance /Anim 2 abundance/....etc
1 /1 /
1 /2 /
1 /3 /
1 /4 /
2 /1 /
2 /2 /
2 /4 /
3 /1 /
3 /2 /
3 /3 /
3 /4 /

Station 2 is missing quadrat 3. I would like to remove all rows (including animal abundance data) associated with station 2 from further analysis. I would like to do this in a function as I have multiple large csv files I need to clean up.

I tried subset and for loops but struggling with both

Thank you for your time

******update I'm working with this qc_Large29 <- Large29[Large29[, 5]>=4,]

which gives me all the 4th quadrats from each station. Is there a way to add a length() to it so that the new dataframe will only be the data associated with stations that have 4 quadrats?

**********update

 dput(Large29[1:30,1:5])
structure(list(FID = 652:681, areaContro = c(29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 
29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L, 29L
), areaShortN = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L), .Label = "CAIIN", class = "factor"), station = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L), quadrat = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L)), .Names = c("FID", 
"areaContro", "areaShortN", "station", "quadrat"), row.names = c(NA, 
30L), class = "data.frame")
>

it would be helpful if you could post a minimal example of your dataframes, e.g. using `dput( your.data )`. — David Heckmann, Mar 07 '17 at 18:22
It's too long by over 60k characters, but here's a couple excepts.. what are you interested in, maybe I can answer? ("FID", "areaContro", "areaShortN", "station", "quadrat", "latitude", "longitude", "depthFatho", "surveyDTTM", "updatedPK", "surveyRawD", "cameraCont", "imageExist", "isImageOfI", "sand", "sandRipple", "shellDebri", "silt", "gravel", "scallops", "clappers", "seed", "seaStars", "crabs", "hermitCrab", "echinoderm", "lobster", "sandDollar", "ad", "anemone", "bHydra", — lauren, Mar 07 '17 at 18:36
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0), herring = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, — lauren, Mar 07 '17 at 18:38
subset your data so that it still contains examples of the rows you want to remove, e.g. by `dput(yourdata[1:30,])` . — David Heckmann, Mar 07 '17 at 18:39
The problem is that I was unclear with my dataset description? — lauren, Mar 07 '17 at 18:50
The answer will depend on the class of yor data. if this (http://stackoverflow.com/questions/8005154/conditionally-remove-dataframe-rows-with-r?rq=1) doesn't help, you will need to create a minimal example (`dput(yourdata[1:30,1:5])`). — David Heckmann, Mar 07 '17 at 18:57

score 0 · Accepted Answer · answered Mar 07 '17 at 19:24

0

This selects everything but the "2" stations:

Large29[Large29$station!=2,]

for you second question (the edit), I would suggest to use dplyr, where you can group by stations:

library(dplyr)
Large29 %>% group_by(station) %>% filter(n()>=4) %>% as.data.frame()

answered Mar 07 '17 at 19:24

David Heckmann

2,899
2
20
29

No worries, welcome to SO! Please also see http://stackoverflow.com/help/someone-answers. – David Heckmann Mar 07 '17 at 20:03

Making simple function to clean up data (removing rows conditionally without NA)

1 Answers1