0

I have a dataset containing Airbnb listings. I want to count the number of listings for each host_id based on if they are Entire home or Shared home per month. Therefor I assume I need two additional columns with the count for each row (tot_EH and tot_SH).

I've posted an image below to show how the dataset looks like and the desired output (deleted some columns that are not relevant). Now I just used one host_id but in reality it's many different ones.

Marked the new columns in red and entered the desired output. Can't figure out how to proceed. Would really appreciate some help!

enter image description here

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
fellan_123
  • 23
  • 1
  • 3
  • Possible duplicate of [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – DanY Oct 21 '19 at 20:19
  • Thanks for your answer. Unfortunately it didn't work and I got this error message: Error in aggregate.data.frame(lhs, mf[-1L], FUN = FUN, ...) : no rows to aggregate Sorry if I didn't phrase my question right. Found a way to fix it :) – fellan_123 Oct 22 '19 at 18:00

3 Answers3

0

Got help from a colleague and this worked:

df <- df %>%
  group_by(host_id, last_scraped) %>% # group data by host and month
  mutate(count_listings_in_data = length(unique(id)), # for each host/month combination; count the number of unique listing IDs
         count_shared_homes = length(unique(id[which(room_type_NV == "Shared home")])), # for each host/month combination; count the number of unique listing IDs for which the room type is "shared" 
         count_entire_homes = length(unique(id[which(room_type_NV == "Entire home")]))) # for each host/month combination; count the number of unique listing IDs for which the room type is "entire" 
fellan_123
  • 23
  • 1
  • 3
0

One data.table approach, assuming your data are in a data.frame named df

library(data.table)
setDT(df)
df[room_type_NV == "Entire Home" , tot_EH := .N, by=.(date, host_id)]
df[room_type_NV == "Shared Home" , tot_SH := .N, by=.(date, host_id)]
DanY
  • 5,920
  • 1
  • 13
  • 33
0

Base R Solution:

df$grouping_var <- paste(df$host_id, as.Date(df$date, "%m-%Y"), sep = "_")

count_df <- data.frame(do.call("rbind", lapply(split(df, df$grouping_var),

                   function(x){

                     tmp <- data.frame(t(tapply(x$room_type_NV, x$room_type_NV, length)))

                     return(cbind(x, data.frame(tmp[rep(seq_len(nrow(tmp)), nrow(x)), ], row.names = NULL)))

                    }

                   )

                  ),

            row.names = NULL

             )

Data:

    structure(list(id = c(2, 1, 3, 1, 2, 3, 1, 2, 1, 2), date = structure(c(16983, 
16983, 16983, 17014, 17014, 17014, 17045, 17045, 17106, 17106
), class = "Date"), host_id = c(27280608, 27280608, 27280608, 
27280608, 27280608, 27280608, 27280608, 27280608, 27280608, 27280608
), room_type_NV = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 
1L, 2L), .Label = c("Entire home", "Shared home"), class = "factor"), 
    grouping_var = c("27280608_2016-07-01", "27280608_2016-07-01", 
    "27280608_2016-07-01", "27280608_2016-08-01", "27280608_2016-08-01", 
    "27280608_2016-08-01", "27280608_2016-09-01", "27280608_2016-09-01", 
    "27280608_2016-11-01", "27280608_2016-11-01")), row.names = c(NA, 
-10L), class = "data.frame")
hello_friend
  • 5,682
  • 1
  • 11
  • 15