-2

I have a dataframe now of a list of urls and i'm trying to find the top 10 urls based on freq. This is what I have,

    +------------+
    |urls        |
    +------------+
    |google.com  |
    |linkedin.com|
    |yahoo.com   |
    |google.com  |
    |yahoo.com   |
    +------------+
    
I tried to add a freq column but I cannot seem to get it. I tried count(df,"url") but it only gives me the freq without the urls like this,
    +----+
    |freq|
    +----+
    |2   |
    |1   |
    |2   |
    |2   |
    |2   |
    +----+

can I know how can I get a dataframe like this,

    +---------------+------------+
    |urls           |   freq     |
    +---------------+------------+
    |google.com     |   2        |
    |linkedin.com   |   1        |
    |yahoo.com      |   2        |
    |google.com     |   2        |
    |yahoo.com      |   2        |      
    +---------------+------------+

also I need to sort it by top 10?

user8706644
  • 47
  • 1
  • 9

2 Answers2

0

Table returns the frequency of the urls. Then you can sort it decreasing and pick the first 10.

sort(table(df$urls), decreasing = T)[1:10]

if you want to have the url names use

names(sort(table(df$urls), decreasing = T)[1:10])
Linus
  • 705
  • 1
  • 10
  • 20
0

Here's a tidyverse solution. Use group_by and n to get the counts of each url. Then order the rows with arrange.

library('tidyverse')

df <- tibble(urls = c('google.com ', 'linkedin.com', 'yahoo.com ', 'google.com ', 'yahoo.com'))

df %>%
  group_by(urls) %>%
  mutate(freq = n()) %>%
  arrange(desc(freq)) %>%
  head(10)
#> # A tibble: 5 x 2
#> # Groups:   urls [4]
#>           urls  freq
#>          <chr> <int>
#> 1   google.com     2
#> 2   google.com     2
#> 3 linkedin.com     1
#> 4    yahoo.com     1
#> 5    yahoo.com     1
Paul
  • 8,734
  • 1
  • 26
  • 36