-1

I am trying to calculate the fraction of the construction noise per zip code across NY city. The data is from NYC 311.

I am using dplyr and have grouped the data per zip. However, I am finding difficulties merging the row for the complain column, I have to merge the data as per the string "construction" it appear anywhere meaning middle, front or end.

My solution, this is just the beginning

comp_types <- df %>% select(complaint_type,descriptor,incident_zip) %>% 
    group_by(incident_zip) 

can you help me merge the row if unique value in descriptor contains any construction value.

zephryl
  • 14,633
  • 3
  • 11
  • 30
Ellie
  • 3
  • 1
  • 3
    Please show few rows of data with `dput(head(df, 10))` and your expected output – akrun Dec 02 '22 at 03:59
  • 1
    Could you post reproducible example? – Juan Camilo Rivera Palacio Dec 02 '22 at 10:34
  • Descriptor Com. Proportion Loud Music/Party 141 0.141 Banging/Pounding 88 0.088 ENTIRE BUILDING 85 0.085 Blocked Hydrant 72 0.072 No Access 59 0.059 APARTMENT ONLY 44 0.044 Commercial Overnight Parking 32 0.032 Loud Talking 31 0.031 N/A 28 0.028 Posted Parking Sign Violation 27 0.027 – Ellie Dec 02 '22 at 17:20
  • I have to merge the first row based on the element "music" in the row observation and get the complaints counts which is the second column – Ellie Dec 02 '22 at 17:21
  • When you use `dput(head(df, 10))`, can you copy the output that starts with `structure` and paste it into your original question? – jrcalabrese Dec 02 '22 at 17:25

1 Answers1

0

Can you clarify what you mean by "merging"? I don't think you actually want to merge because you only have one dataframe. The term "merging" is used to describe the joining of two dataframes.

See ?base::merge:

Merge two data frames by common columns or row names, or do other versions of database join operations.

If I understand correctly, you want to look into the descriptor variable and see if it contains the string "construction" anywhere in the cell, so you can determine if the person's complaint was construction-related; same for "music". I don't believe you need to use complaint_type since complaint_type never contains the string "construction" or "music"; only descriptor does.

You can use a combination of ifelse and grepl to create a new variable that indicates whether the complaint was construction-related, music-related, or other.

library(tidyverse)
library(janitor)
url <- "https://data.cityofnewyork.us/api/views/p5f6-bkga/rows.csv"
df <- read.csv(url, nrows = 10000) %>%
  clean_names() %>%
  select(complaint_type, descriptor, incident_zip)

comp_types <- df %>% 
  select(complaint_type, descriptor, incident_zip) %>% 
  group_by(incident_zip) 
head(comp_types)
#> # A tibble: 6 × 3
#> # Groups:   incident_zip [6]
#>   complaint_type      descriptor                                   incident_zip
#>   <chr>               <chr>                                               <int>
#> 1 Noise - Residential Banging/Pounding                                    11364
#> 2 Noise - Residential Loud Music/Party                                    11222
#> 3 Noise - Residential Banging/Pounding                                    10033
#> 4 Noise - Residential Loud Music/Party                                    11208
#> 5 Noise - Residential Loud Music/Party                                    10037
#> 6 Noise               Noise: Construction Before/After Hours (NM1)        11238

table(df$complaint_type)
#> 
#>                    Noise       Noise - Commercial       Noise - Helicopter 
#>                      555                      591                      145 
#> Noise - House of Worship             Noise - Park      Noise - Residential 
#>                       20                       72                     5675 
#>  Noise - Street/Sidewalk          Noise - Vehicle 
#>                     2040                      902

df <- df %>%
  mutate(descriptor_misc = ifelse(grepl("Construction", descriptor), "Construction", 
                            ifelse(grepl("Music", descriptor), "Music", "Other")))

df %>%
  group_by(descriptor_misc) %>%
  count()
#> # A tibble: 3 × 2
#> # Groups:   descriptor_misc [3]
#>   descriptor_misc     n
#>   <chr>           <int>
#> 1 Construction      328
#> 2 Music            6354
#> 3 Other            3318

head(df)
#>        complaint_type                                   descriptor incident_zip
#> 1 Noise - Residential                             Banging/Pounding        11364
#> 2 Noise - Residential                             Loud Music/Party        11222
#> 3 Noise - Residential                             Banging/Pounding        10033
#> 4 Noise - Residential                             Loud Music/Party        11208
#> 5 Noise - Residential                             Loud Music/Party        10037
#> 6               Noise Noise: Construction Before/After Hours (NM1)        11238
#>   descriptor_misc
#> 1           Other
#> 2           Music
#> 3           Other
#> 4           Music
#> 5           Music
#> 6    Construction
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
  • If you feel that I've appropriately answered your question, don't forget to click the green checkmark next to my answer. – jrcalabrese Dec 05 '22 at 18:29