-1

I have a big dataset with neighbourhoods, but I only want a subset of those neighbourhoods for which I have a vector of characters. I thought about subsetting this using a for loop, see data examples and code examples below. It seems like my for loop is actually looping over all the neighbourhoods - however - in my new dataset only one neighbourhood appears, instead of the subset with all the neighbourhoods. Can anybody tell me what I'm doing wrong in the codes?

I was inspired by all these answers: How to subset the dataframe byusing for loop and if condition in r

subset data frame in R using loop

R: loop through data frame extracting subset of data depending on date

Best. And thanks.

EXAMPLE DATA

Big dataset:

> head(CBS, n=5)
# A tibble: 5 x 37
                  `Wijken en buurten` `Aantal inwoners` `||Mannen` `||Vrouwen`
                                <chr>             <dbl>      <dbl>       <dbl>
1                             Alkmaar            108373      53659       54714
2                                Zuid             14315       6785        7525
3                            Kooimeer              2040        930        1105
4   Dillenburg en Stadhouderskwartier              1310        605         700
5 Staatsliedenkwartier en Landstraten              2130       1015        1110

But I only need the information from these neighbourhoods:

> head(buurten_2, n=5)
     buurten_2          
[1,] "Oud-Overdie"      
[2,] "Overdie-West"     
[3,] "Overdie-Oost"     
[4,] "Oosterhout"       
[5,] "De Hoef III en IV"

These correspond to variable names in the 'Wijken en Buurten' column of CBS.

These are the loops I've tried: (of some I knew it wouldn't be working but I was desperate...)

for (i in 1:nrow(buurten_2)){
  if (CBS$`Wijken en buurten`[i] == buurten_2[i])
    data <- append(data, CBS[i,])
  print(buurten_2[i])
}

for (i in 1:length(buurten_2)){
  temp <- CBS[CBS$`Wijken en buurten`==buurten_2[i],]
  print(buurten_2[i])
}

for (i in 1:length(buurten_2)){
  data <- subset(CBS, CBS$`Wijken en buurten` == buurten_2[i])
  print(buurten_2[i])
}

for (buurten in 1:nrow(buurten_2)){
  CBS %>% 
    filter(CBS$`Wijken en buurten`[i] == buurten_2[buurten])
}

EDIT

But that gives me this

# A tibble: 1 x 37
  `Wijken en buurten` `Aantal inwoners` `||Mannen` `||Vrouwen` `|||Ongehuwd` `||Gehuwd`
                <chr>             <dbl>      <dbl>       <dbl>         <dbl>      <dbl>
1      Oudorp-Centrum              1930        935         995           730        940
Hannie
  • 417
  • 5
  • 17
  • 2
    Why are you using a loop? Why not ```CBS[CBS$`Wijken en buurten` %in% buurten_2,]```? – David Arenburg Mar 26 '18 at 08:50
  • @DavidArenburg I used a loop because I thought I needed one. Apparently I don't because this works! Can you explain what the %in% does? – Hannie Mar 26 '18 at 08:51
  • 1
    It basically checks for each value in `CBS$\`Wijken en buurten\`` if it can be matched to any value in `buurten_2` and returns a Boolean vector. As a general rule, you would very rarely need to resort to a loop in R. – David Arenburg Mar 26 '18 at 08:56
  • Aha... i'm still very much developing my coding skills, this proves that! Thank you for the explanation! :) @DavidArenburg – Hannie Mar 26 '18 at 08:59
  • @DavidArenburg You marked the question as a duplicate, is it common on SO that I remove my question now? – Hannie Mar 26 '18 at 10:20
  • Well, duplicates are fine as long as they are marked as ones - it makes the Google search easier (more search queries leading to the same page). Also, once you get an answer, it won't be fair towards the person who answered. Though answering dupes instead of marking them as ones is also thrown upon, but that's another story. – David Arenburg Mar 26 '18 at 10:23

1 Answers1

0

No!

That's not the way it works in R. ;) You want to use vectorized code because it's much more concise and faster (in R). Here are two solutions:

df = subset(CBS, `Wijken en buurten` %in% c("Oud-Overdie", "Overdie-West", "Overdie-Oost", "Oosterhout", "De Hoef III en IV"))

df = CBS[CBS$`Wijken en buurten` %in% c("Oud-Overdie", "Overdie-West", "Overdie-Oost", "Oosterhout", "De Hoef III en IV"),]
ChrKoenig
  • 901
  • 1
  • 9
  • 23