1

Below is the initial data frame (df) and then I attempted to create a second data frame (df_Q7) that shows calculations for other fields. I'm relatively new to this so instructive criticism is appreciated. That said, I need to create a K-means from a data frame showing this data found in df_Q7.

df<- NYC_TRANSACTION_DATA %>%
  left_join(NEIGHBORHOOD,by="NEIGHBORHOOD_ID") %>%
  left_join(BUILDING_CLASS,by="BUILDING_CLASS") %>%
  left_join(BOROUGH,by="BOROUGH_ID") %>%
  mutate(YEAR=as.integer(format(NYC_TRANSACTION_DATA$SALE_DATE,"%Y")))

df_Q7<-df%>%
  filter(TYPE=="RESIDENTIAL")%>%
  group_by(NEIGHBORHOOD_NAME=="BRONXDALE") %>%
summarize(MedianSalePrice=median(SALE_PRICE),PricePerSQFT=sum(SALE_PRICE/GROSS_SQUARE_FEET),sdResidential=sd(SALE_PRICE),SALES_NUMBERS=sum(SALE_PRICE))%>%
  mutate(PROPORTION_RESIDENTIAL=SALES_NUMBERS/sum(SALES_NUMBERS))
M--
  • 25,431
  • 8
  • 61
  • 93
  • You need to share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – M-- Oct 28 '22 at 16:07
  • `NEIGHBORHOOD_NAME=="BRONXDALE"` returns a binary value, `FALSE/TRUE` and `group_by(NEIGHBORHOOD_NAME=="BRONXDALE")` groups into two groups. Then `summarise` computes aggregate statistics for each of those groups and outputs a df with two rows. – Rui Barradas Oct 28 '22 at 16:08
  • If you want to group by `NEIGHBORHOOD_NAME`, remove the comparison `==` but this will give you all values in `NEIGHBORHOOD_NAME`. – Rui Barradas Oct 28 '22 at 16:17

0 Answers0