Below is the initial data frame (df) and then I attempted to create a second data frame (df_Q7) that shows calculations for other fields. I'm relatively new to this so instructive criticism is appreciated. That said, I need to create a K-means from a data frame showing this data found in df_Q7.
df<- NYC_TRANSACTION_DATA %>%
left_join(NEIGHBORHOOD,by="NEIGHBORHOOD_ID") %>%
left_join(BUILDING_CLASS,by="BUILDING_CLASS") %>%
left_join(BOROUGH,by="BOROUGH_ID") %>%
mutate(YEAR=as.integer(format(NYC_TRANSACTION_DATA$SALE_DATE,"%Y")))
df_Q7<-df%>%
filter(TYPE=="RESIDENTIAL")%>%
group_by(NEIGHBORHOOD_NAME=="BRONXDALE") %>%
summarize(MedianSalePrice=median(SALE_PRICE),PricePerSQFT=sum(SALE_PRICE/GROSS_SQUARE_FEET),sdResidential=sd(SALE_PRICE),SALES_NUMBERS=sum(SALE_PRICE))%>%
mutate(PROPORTION_RESIDENTIAL=SALES_NUMBERS/sum(SALES_NUMBERS))