0

I have a small data set and I am trying to subset the data.frame using the grepl function.

I have;

year_list <- list("2013", "2014", "2015", "2016", "2017")

test.2013 <- subset(searches[, 1:2], grepl(year_list[1], searches$date))
test.2014 <- subset(searches[, 1:2], grepl(year_list[2], searches$date))
test.2015 <- subset(searches[, 1:2], grepl(year_list[3], searches$date))
test.2016 <- subset(searches[, 1:2], grepl(year_list[4], searches$date))
test.2017 <- subset(searches[, 1:2], grepl(year_list[5], searches$date))

I am trying to create a loop in order to subset columns 1 to 2 (the date column and hits column) into a new data.frame.

I am trying to take the date in year_lists, apply the grepl function to the column date in the searches data.frame and return these values into a new data.frame but using a loop function or something less repetitive than what I currently have.

Dataframe

         date hits         keyword   geo gprop category
1: 2013-01-06   23  Price world   web        0
2: 2013-01-13   23  Price world   web        0
3: 2013-01-20   40  Price world   web        0
4: 2013-01-27   25  Price world   web        0
5: 2013-02-03   21  Price world   web        0
6: 2013-02-10   19  Price world   web        0
user113156
  • 6,761
  • 5
  • 35
  • 81
  • 1
    You are using a data.table-object. – jogo Dec 07 '17 at 14:51
  • 1
    `library("lubridate"); searches[, Year:=year(as.Date(date))]` ... now you can do the `split(searches, searches[, Year])` ... or eventually you want to use the `by=` parameter of `data.table` for your further calculation. – jogo Dec 07 '17 at 15:05

1 Answers1

1

If my understanding is correct that you want to split a data.frame into several data.framess on basis of the entries in the date column, then you might consider the following solution which produces a list of the desired data.frame subsets using split. I have used your data (not as data.table) and introduced two lines representing an additional year. I hope my understanding was correct.

df <- read.table(text = "
date hits         keyword   geo gprop category
2013-01-06   23  Price world   web        0
2013-01-13   23  Price world   web        0
2013-01-20   40  Price world   web        0
2013-01-27   25  Price world   web        0
2013-02-03   21  Price world   web        0
2013-02-10   19  Price world   web        0
2014-02-03   21  Price world   web        0
2014-02-10   19  Price world   web        0
", header = T, stringsAsFactors = F)

#extract only the four first digits from date column
#to generate splitting groups
df_split <- split(df[, c("date", "hits")], gsub("(\\d{4})(.*$)", "\\1", df$date))

df_split
# $`2013`
#       date    hits
# 1 2013-01-06   23
# 2 2013-01-13   23
# 3 2013-01-20   40
# 4 2013-01-27   25
# 5 2013-02-03   21
# 6 2013-02-10   19
# 
# $`2014`
#       date    hits
# 7 2014-02-03   21
# 8 2014-02-10   19
Manuel Bickel
  • 2,156
  • 2
  • 11
  • 22
  • Not quite, I followed your method but could not put it into a data.frame once split – user113156 Dec 07 '17 at 15:46
  • I have been working on the following `func <- for(i in 1:5){ df <- subset(searches[, 1:3], grepl(year_list[i], searches$date)) print(df) } data <- data.frame(df)` - How ever this only "saves" the last year, so I have a new data.frame but for only 2017. I am trying to create the data.frame for all years 2013 - 2017 – user113156 Dec 07 '17 at 15:47
  • 1
    Why do you need your `data.frame`s as separate variables? You can access each of them within the list structure, e.g., `df_split[["2013"]]`. If you insist on creating separate variables I can provide you a solution on basis of [this answer](https://stackoverflow.com/questions/16566799/change-variable-name-in-for-loop-using-r), although, it is also highlighted there that this approach should not be followed. Regarding your loop, within your for loop you overwrite your df in each iteration, hence, only the last one survives the loop. – Manuel Bickel Dec 07 '17 at 15:56
  • Let me know if this solves your problem or if you need additional support. – Manuel Bickel Dec 07 '17 at 16:04
  • 1
    No thanks, this actually did solve my problem, I wanted to do it in a clean loop function but with a little cleaning I get the same overall effect, thanks! – user113156 Dec 07 '17 at 16:15
  • I am glad it worked for you. As an additional hint, you might use the `gsub`part of above code in combination with `unique` to extract a "yearlist" that you can then use to programmatically access all or selected years of the list. – Manuel Bickel Dec 07 '17 at 16:20