Here's the data.table
approach. I used the data you provided to illustrate the concept - going forward please provide data so others can reproduce the problem (as pointed out in the comments).
DATA
library(data.table)
dt <- data.table(hostname = c("hello.com", "news.com", "facebook", "yahoo", "facebook"),
request = c("GET /blah/blah", "GET /hello", "GET /no", "GET /yes", "GET /hello"))
CODE
> dt
hostname request
1: hello.com GET /blah/blah
2: news.com GET /hello
3: facebook GET /no
4: yahoo GET /yes
5: facebook GET /hello
> dt[, .N, by = hostname]
hostname N
1: hello.com 1
2: news.com 1
3: facebook 2
4: yahoo 1
Here .N
is data.table
parameter that gives you the count. You can rename it to something else ("count" in the below example):
> dt[, .(count = .N), by = hostname]
hostname count
1: hello.com 1
2: news.com 1
3: facebook 2
4: yahoo 1
If you expect to have multiple possibilities for each entry e.g. facebook or facebook.com or facebook.co.uk, you would need to us regular expressions. A good approach in that case would be to sort by name and then use grep
to find the common pattern and aggregate by those.