0

The following is an example based on a small subset of my data:

NAME <- c("SYNOP", "SYNOP", "METAR", "METAR", "SYNOP", "METAR")
AIR <-  c(6.7, 8.3, 9.2, 8.9, 9.1, 8.7)
Example <- data.frame(NAME, AIR)

   NAME AIR
1 SYNOP 6.7
2 SYNOP 8.3
3 METAR 9.2
4 METAR 8.9
5 SYNOP 9.1
6 METAR 8.7

I am using grep to select a subset of this data where NAME == METAR and find out the number of occurences:

ex_METAR <- Example[grep("METAR", Example$NAME), ]
nrow(ex_METAR)

I have to repeat this for a large number of instances of NAME and wanted to speed this process up by making use of it in a function. However I must be doing something wrong as I get an error message each time:

example_Function <- function (A, B, C) {
A[grep("B", A$C), ]
}

> example_Function(Example, "METAR", Example$NAME)
[1] NAME AIR 
<0 rows> (or 0-length row.names)

I have thought it was how I am describing "METAR" so I've tried the function with only A and C and get the same error.

example_Function <- function (A, C) {
A[grep("METAR", A$C), ]
}
example_Function(Example, Example$NAME)

Is there something I'm actively doing wrong or will this simply just not work? I've never tried to adapt a function in this way before. Or maybe a function is the wrong way to go?! Thanks in advance.

(Not a duplicate of Aggregate a dataframe on a given column and display another column which is looking for subsetting with maximums. I need to subset for the words in a column and know how many times that has happened.)

Community
  • 1
  • 1
Quinn
  • 419
  • 1
  • 5
  • 21
  • Possible duplicate of [Aggregate a dataframe on a given column and display another column](http://stackoverflow.com/questions/6289538/aggregate-a-dataframe-on-a-given-column-and-display-another-column) – Sotos Sep 15 '16 at 12:06

1 Answers1

1

I think this is what you are looking for :

NAME <- c("SYNOP", "SYNOP", "METAR", "METAR", "SYNOP", "METAR")
AIR <-  c(6.7, 8.3, 9.2, 8.9, 9.1, 8.7)
Example <- data.frame(NAME, AIR)

library(dplyr)

Example %>% group_by(NAME) %>% summarize(Count=n())

Output :

Source: local data frame [2 x 2]

    NAME Count
  (fctr) (int)
1  METAR     3
2  SYNOP     3
prateek1592
  • 547
  • 5
  • 13
  • Yup that's pretty much the non-convoluted way of doing that! Thank you – Quinn Sep 15 '16 at 11:59
  • Could you please mark it as the answer, if it solved your issue. Thanks! – prateek1592 Sep 15 '16 at 12:00
  • Thanks @prateek1592, I have, you just have to wait before you are allowed to accept an answer. I have one further question, could I perhaps incorporate a loop or apply into this? I actually have quite a few columns in the same table I need to apply this code too (not in the example!) – Quinn Sep 15 '16 at 12:17
  • Ah, I see. So do you want to 1) Group by more than one column at the same time? Or 2) Group multiple times, taking one column at a time? – prateek1592 Sep 15 '16 at 12:22
  • Multiple times. I have columns such as ID and SOURCE that need to be grouped by different variables as well. I know I can just execute the code numerous times but the number of columns is going to increase over time – Quinn Sep 15 '16 at 12:24
  • I think you can try to loop over this code. No better solution comes to mind at the moment... – prateek1592 Sep 15 '16 at 12:26
  • I guess so, but it hasn't worked so far! It outputs the wrong thing even before I add more columns (also does it after ofc) `testf <- function(i, j) { i %>% group_by((j)) %>% summarize(Count=n()) } #Works perfect for testf(Example, NAME) vector <- c("NAME") for(i in 1:length(vector)) { Test <- testf(Example, vector[i]) print(Test) }` – Quinn Sep 15 '16 at 13:28
  • I believe this is what you are looking for - http://stackoverflow.com/a/32336788/5908050 ? – prateek1592 Sep 15 '16 at 13:39