2

The data set I'm working with has 13 columns with the following headers (bold), and 302 rows : id: 001, 002,..., 302), source_code: : AAA, BBB, CCC, date,day, month, year, time,hour, minute, second,latitude, longitude, inscriptions: NA, 1 or 0.

I have a script that creates density maps using this dataset, however, I want to be able to use filters that select the data I want the maps to include and exclude.

example 1: I want to ONLY select the data with id = 1-123 ( and name this selection of data: data_A)

example 2: I want to ONLY select the data with id = 124-168 (and name this: data_B )

example 3: I want to ONLY select the data with id = 1-168 (and name this: data_AB )

example 4: I want to ONLY select the data where id = 169-302 (name this: data_C)

example 5: I want to ONLY select the data where id = 3 (name this: data_3)

I am new using Rstudio and this platform to ask questions, so sorry beforehand if this explanation is vague!

Thank you!!

Phil
  • 7,287
  • 3
  • 36
  • 66
bluemoon
  • 37
  • 5

2 Answers2

1

We can use %in% to subset the 'id' based on the range of sequence (:) and create new objects

id1 <- as.numeric(data$id)
data_A <- data[id1 %in% 1:123,]
data_B <- data[id1 %in% 124:168,]
data_AB <- data[id1 %in% 1:168,]
data_C <- data[id1%in% 169:302,]
data_3 <- data[id1 == 3,]

Or if we want to keep the range also as string

data_A <- data[data$id %in% sprintf('%03d', 1:123),]
data_B <- data[data$id %in% sprintf('%03d', 124:168),]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @neussegura i would consider converting to numeric. Please check my update. thanks – akrun Oct 18 '20 at 22:26
  • as the **id**s are 3 digit numbers (001, 002,..., 010, 011,..., 301, 302), I modified your script to : `id1 <- as.numeric(data$id)` , `data_A <- data[id1 %in% 001:123,]` , `data_B <- data[id1 %in% 124:168,]` , `data_AB <- data[id1 %in% 001:168,]`, `data_C <- data[Id1 %in% 169:302,]`, `data_3 <- data[id1 == 003,] ` It worked perfectly fine! Thank you so much!!! – bluemoon Oct 18 '20 at 22:34
  • @neussegura now, you don't need `001:123` instead it is just `1:123` because numeric values doesn't pad 0s at the beginning. it is only for strings, we have the padding at the beginning. i.e. `str1 <- '001'; as.numeric(str1)` – akrun Oct 18 '20 at 22:35
  • is there a way I can keep `001:302` instead of `1:302` ? Because I match this dataset with another one with the same IDs after running these filters :) – bluemoon Oct 18 '20 at 22:42
  • @neussegura If you check my code, I am not changing your originaal 'id' column. It just remains the same. If you want to change the range, then one option is `sprintf('%03d', 1:302)` – akrun Oct 18 '20 at 22:43
  • 1
    You are **amaaaazing** !! I've tried all the different ways you have proposed, and using ``data_A <- data[data$id %in% sprintf('%03d', 1:123),]`` , ``data_B <- data[data$id %in% sprintf('%03d', 124:168),]`` is the best choice!! I have tried doing ``data_AB <- data[data$id %in% sprintf('%03d', 1:168),]`` ``data_C <- data[data$id %in% sprintf('%03d', 169:302),]``, ``data_id3 <- data[data$id %in% sprintf('%03d', 3),]`` and they also work! THANK YOU SO MUCH – bluemoon Oct 18 '20 at 23:01
1

Please always include a reproducible minimal example, this makes it a lot easier to help you! Especially using GNU R this is relatively easy.

From what I understand the filter-function in dplyr (or poorman) can accomplish this for you. I wrote an example where you can filter according to the id-column.

library(dplyr)

df <-
  data.frame(c(1,2,3,4,5),
             c("a","b","c","d","e"),
             c(NA,0,1,NA,0))

colnames(df) <- c("id","letter","0or1")

df %>%
  dplyr::filter(id <= 3)
n0542344
  • 111
  • 4