0

I want to create a table (wither data frame or data table) with two columns and 30 rows which are actualy dates. Col1 should contain dates like: yyyymm between 199801 and 200012 in this particular format and randomly generated. Col2 should get only the year information from the first column.

I tried with as.Date but I didn't get teh format above. Any clue how toget it right? Thank you

gogo88
  • 199
  • 1
  • 9
  • 2
    The Date data type will always print in the standard format `YYYY-MM-DD`. You could randomly generate your dates and then use `strftime` to convert to the desired format, but it won't be possible to alter the formatting while also preserving the underlying Date type. – jdobres Nov 13 '19 at 13:31

2 Answers2

2

Something like this?

set.seed(1234)
dates <- (sample(seq(as.Date('1999/01/01'), as.Date('2020/01/01'), by="day"), 30))
data.frame(Col1 = format.Date(dates,"%Y%m"),
           Col2 = format.Date(dates,"%Y"))

     Col1 Col2
1  201805 2018
2  201805 2018
3  200107 2001
4  200506 2005
5  200402 2004
6  201102 2011
7  200203 2002
...
s__
  • 9,270
  • 3
  • 27
  • 45
  • Thank you very much. it works out. May I ask you another question ? Suppose I have the above table where Col1 represent always the year and month and they are actualy characters type. Additionaly, I have a column Col3 (numerical) and need to make some operations on it like: summing up everything yearly based or monthly based. How would that work? – gogo88 Nov 13 '19 at 14:06
  • 1
    There are may options in R: `aggregate(df$Col3, by=list(Col2=df$Col2), FUN=sum);` is base R or you can use `dplyr` package or `data.table`. I advice you to read [this](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group). Also, if you think this is the correct answer, please mark it as correct, it's going to give me and you a bit of reputation, and help the community to find out it: there is no obligation on it of course. – s__ Nov 13 '19 at 14:12
1

To me it sounds like should use a string in Col1. Are you parsing this from dates from another source (csv?, excel?)

Col2 can then just use substr:

dat <- data.frame(col1 = c("201812", "201901"))
dat %>% 
  mutate(
    col2 = substr(col1, 1,4)
  )
Jeroen Colin
  • 345
  • 1
  • 6
  • Thank you very much. it works out. May I ask you another question ? Suppose I have the above table where Col1 represent always the year and month and they are actualy characters type. Additionaly, I have a column Col3 (numerical) and need to make some operations on it like: summing up everything yearly based or monthly based. How would that work? – gogo88 Nov 13 '19 at 14:07
  • 1
    I'd suggest using the `dplyr` package for that. Summarizing per year and month would then be: `dat %>% group_by(col1, col2) %>% summarise(col3Summed = sum(col3))` – Jeroen Colin Nov 14 '19 at 09:40