0

I have a dataset with 2 months of data (month of Feb and March). Can I know how can I split the data into 59 subsets of data by day and save it as data frame (28 days for Feb and 31 days for Mar)? Preferably to save the data frame in different name according to the date, i.e. 20140201, 20140202 and so forth.

    df <- structure(list(text = structure(c(4L, 6L, 5L, 2L, 8L, 1L), .Label = c(" Terpilih Jadi Maskapai dengan Pelayanan Kabin Pesawat cont", 
    "booking number ZEPLTQ I want to cancel their flight because they can not together  with my wife and kids", 
    "Can I change for the traveler details because i choose wrongly for the Mr or Ms part", 
    "cant do it with cards either", "Coming back home AK", "gotta try PNNL", 
    "Jadwal penerbangan medanjktsblm tangalmasi ada kah", "Me and my Tart would love to flyLoveisintheAir", 
    "my flight to Bangkok onhas been rescheduled I couldnt perform seat selection now", 
    "Pls checks his case as money is not credited to my bank acctThanks\n\nCASLTP", 
    "Processing fee Whatt", "Tacloban bound aboardto get them boats Boats boats boats Tacloban HeartWork", 
    "thanks I chatted with ask twice last week and told the same thing"
    ), class = "factor"), created = structure(c(1L, 1L, 2L, 2L, 3L, 
    3L), .Label = c("1/2/2014", "2/2/2014", "5/2/2014", "6/2/2014"
    ), class = "factor")), .Names = c("text", "created"), row.names = c(NA, 
    6L), class = "data.frame")
smci
  • 32,567
  • 20
  • 113
  • 146
user3456230
  • 217
  • 4
  • 13
  • 3
    Before you get downvoted and closed, please post a reproducible example per the great advice at [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – smci Mar 26 '14 at 14:02
  • In particular, use `dput` to show us a snippet of your dataframe . – smci Mar 26 '14 at 14:04
  • Hi, I have showed the str of my dataset and a few examples of the rows of my dataset :-) – user3456230 Mar 26 '14 at 15:01
  • `str` is ok but `dput` is better, so we can reproduce your code. – smci Mar 26 '14 at 15:02
  • Hi, I have updated to dput, but there were some warnings which I don't understand – user3456230 Mar 26 '14 at 15:15
  • (Just ignore the warnings, R's like that) – smci Mar 26 '14 at 15:33
  • @smci what do you mean by ignoring warnings? You should avoid giving bad advice. Generally, having warnings is bad. – agstudy Mar 26 '14 at 15:51
  • @agstudy, with regard to `dput`, it's good advice. `dput` is full of shit: it succeeds yet gives useless warnings. I didn't tell OP to ignore all warnings in R. – smci Mar 26 '14 at 16:03
  • @smci ah ok . I totally agree :) – agstudy Mar 26 '14 at 16:04
  • To clarify: many R commands generate many more warnings than the equivalent in other languages. Use discretion to figure out when they are ignoreable and when not. In this case they are ignoreable. – smci Mar 26 '14 at 16:05
  • OP: to avoid getting all your strings read in as factors, use `read.csv(..., stringsAsFactors=FALSE, ...)`. Or in this case use `read.csv(..., colClasses=c(character,factor), ...)` so only the 'created' field is converted to factor. – smci Mar 26 '14 at 16:06

1 Answers1

1

You don't need to output multiple dataframes. You only need to select/subset them by year&month of the 'created' field. So here are two ways do do that: 1. is simpler if you don't plan on needing any more date-arithmetic

# 1. Leave 'created' a string, just use text substitution to extract its month&date components
df$created_mthyr <- gsub( '([0-9]+/)[0-9]+/([0-9]+)', '\\1\\2', df$created )

# 2. If you need to do arbitrary Date arithmetic, convert 'created' field to Date object
# in this case you need an explicit format-string 
df$created <- as.Date(df$created, '%M/%d/%Y')

# Now you can do either a) split
split(df, df$created_mthyr)
# specifically if you want to assign the output it creates to 3 dataframes:

df1 <- split(df, df$created_mthyr)[[1]]
df2 <- split(df, df$created_mthyr)[[2]]
df5 <- split(df, df$created_mthyr)[[3]]

# ...or else b) do a Split-Apply-Combine and perform arbitrary command on each separate subset. This is very powerful. See plyr/ddply documentation for examples.
require(plyr)
df1 <- dlply(df, .(created_mthyr))[[1]]
df2 <- dlply(df, .(created_mthyr))[[2]]
df5 <- dlply(df, .(created_mthyr))[[3]]

# output looks like this - strictly you might not want to keep 'created','created_mthyr':
> df1
#                          text  created created_mthyr
#1 cant do it with cards either 1/2/2014        1/2014
#2               gotta try PNNL 1/2/2014        1/2014

> df2                                                                                                         
#3                                                                                        
#Coming back home AK
#4 booking number ZEPLTQ I want to cancel their flight because they can not together  with my wife and kids
#   created created_mthyr
#3 2/2/2014        2/2014
#4 2/2/2014        2/2014
smci
  • 32,567
  • 20
  • 113
  • 146
  • Thanks for the reply. However, I still do not understand how to get the original dataframe to be split to 3 sub dataframes? and how to make it automated to fill the name for each sub dataframe? Assuming I want the dataframe to have the same name as the date? – user3456230 Mar 26 '14 at 16:26
  • Added. Access the individual df's returned from `index/dlply` with [[n]] and assign them to whatever you want. – smci Mar 26 '14 at 17:10
  • But again, you gain nothing at all (and lose a lot) from assigning them to multiple sub-dataframes. What's wrong with keeping them in one datframe? Give us an example of 'fill the name for each sub dataframe'? In fact, show us the final computation of what you're **doing** with the sub-dataframes. – smci Mar 26 '14 at 17:13