How do I get a data frame that I am able to run K-means, that shows all the data instead of just two rows?

Question

Below is the initial data frame (df) and then I attempted to create a second data frame (df_Q7) that shows calculations for other fields. I'm relatively new to this so instructive criticism is appreciated. That said, I need to create a K-means from a data frame showing this data found in df_Q7.

df<- NYC_TRANSACTION_DATA %>%
  left_join(NEIGHBORHOOD,by="NEIGHBORHOOD_ID") %>%
  left_join(BUILDING_CLASS,by="BUILDING_CLASS") %>%
  left_join(BOROUGH,by="BOROUGH_ID") %>%
  mutate(YEAR=as.integer(format(NYC_TRANSACTION_DATA$SALE_DATE,"%Y")))

df_Q7<-df%>%
  filter(TYPE=="RESIDENTIAL")%>%
  group_by(NEIGHBORHOOD_NAME=="BRONXDALE") %>%
summarize(MedianSalePrice=median(SALE_PRICE),PricePerSQFT=sum(SALE_PRICE/GROSS_SQUARE_FEET),sdResidential=sd(SALE_PRICE),SALES_NUMBERS=sum(SALE_PRICE))%>%
  mutate(PROPORTION_RESIDENTIAL=SALES_NUMBERS/sum(SALES_NUMBERS))

You need to share a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — M--, Oct 28 '22 at 16:07
`NEIGHBORHOOD_NAME=="BRONXDALE"` returns a binary value, `FALSE/TRUE` and `group_by(NEIGHBORHOOD_NAME=="BRONXDALE")` groups into two groups. Then `summarise` computes aggregate statistics for each of those groups and outputs a df with two rows. — Rui Barradas, Oct 28 '22 at 16:08
If you want to group by `NEIGHBORHOOD_NAME`, remove the comparison `==` but this will give you all values in `NEIGHBORHOOD_NAME`. — Rui Barradas, Oct 28 '22 at 16:17

How do I get a data frame that I am able to run K-means, that shows all the data instead of just two rows?

0 Answers0