3

I am analyzing a spreadsheet of 181,000+ user actions and am interested to know how many actions each user performed. I want R to show me how many times a particular user's name appears, ranked from highest to lowest, so that I can focus on the users performing the most actions (we are not really interested in users that perform, say, ten actions, when our most active user performed 101,554 in the past week). I have created the character vector new.variable.v to select the "screen_name" column of the spreadsheet, and

table(table(new.variable.v))

shows me counts, but not screen names. Every other solution I've read seems geared to identifying the number of times a particular instance occurs, whereas I want to know the number of times each different screen name occurs. A more R-adept friend suggested some other things, copied from my console with their error messages:

new.variable.sort[order(new.variable.sort[,2], decreasing=TRUE),]

Error in [.default(new.variable.sort, , 2) : incorrect number of dimensions

new.variable.order <- count(new.variable.v)

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "character"

count(new.variable.v, x) %>%
    arrange(desc(n))

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "character"

I've googled these errors and read some other Stack Overflow entries about them, but I have not been able to produce a successful outcome.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sam Austin
  • 41
  • 2
  • 2
    Hi Sam! Welcome to SO. Can you provide a reproducible example (say, a data frame definition with just a handful of example rows)? – Joy Apr 12 '17 at 17:44
  • You mean like this?> table(new.variable.v) new.variable.v -MakesSense- 12AIM784 12blue 12for14 1423cd 2011Rainmakr 37Tomate37 400000 4200 314 12 1 1 5 756 3 2 11 427Cobra 47Steve 50Cal 54p095 a_go_c aatkadam AAUSA7 AbackCab AbackChum35 1 143 1 3 5 7 2 1 2 AbackDate24 AbackEar10 AbackEase28 AbackLink18 AbackNest98 – Sam Austin Apr 12 '17 at 17:49
  • > table(data$screen_name, sort=true) Error in data$screen_name : object of type 'closure' is not subsettable > freq<- data%>% count(screen_name, sort = TRUE) Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "function" no applicable method for 'group_by_' applied to an object of class "character" – Sam Austin Apr 12 '17 at 17:52
  • 1
    it's most helpful if we can actually *create* the data you're having a problem with, not just see what your outputs are. For example, `data.frame(new.variable.v = c("A", "B", "C"))` .. You might benefit from giving http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example a read... – Joy Apr 12 '17 at 17:52
  • OK, how about (new.variable.v = c("EagerTape68", "EagerTape68", "jebinless3", "BoldOther94", "BoldOther94", "2011Rainmakr", "EagerTape68")) with that vector going on for 180K+ additional entries? Desired output from this example would be EagerTape68 = 3, BoldOther94 = 2, 2011Rainmakr = 1, jebinless3 = 1. – Sam Austin Apr 12 '17 at 17:59
  • Would `table(new.variable.v)` not give you what you want? You can sort the output of `table()` fine. Something like this works: `sort(table(new.variable.v),decreasing=T)`. – Mike H. Apr 12 '17 at 18:10
  • Yes, it worked perfectly! Thanks! – Sam Austin Apr 12 '17 at 18:13

1 Answers1

1

An alternative, dplyr / tidyverse style, approach would be to first read the spreadsheet in to a dataframe using df <- readxl::read_excel(). Then, you can find the counts of screen_name with:

res <- df %>%
  group_by(screen_name) %>%
  summarise(volume = n()) %>%
  arrange(desc(volume))

I personally like this approach because I work in RStudio, and outputting the results as a dataframe lets me easily view and play around with it.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mdpead
  • 81
  • 1
  • 5