I have a dataframe (dtetags.df) with a date column that has many duplicate dates:
dtetags.df$Date
"2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14"
"2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08"
"2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30"
"2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24"
"2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15"
"2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06"
"2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23"
"2016-05-23" "2016-05-20"
and a number of binary tag columns that show whether a post was made with that tag on that date, for example:
dtetags.df$Technology
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
"0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
and I am trying to use ddply(dtetags.df,"Date",numcolwise(sum))
based on this question but it returns this error message <0 rows> (or 0-length row.names)
. I have tried a number of different ways to format the ddply command, but I cannot get it to work.
The ideal output would look like:
Date Technology
1 2016-07-22 0
2 2016-07-21 0
3 2016-07-20 0
4 2016-07-19 0
5 2016-07-18 0
6 2016-07-15 0
7 2016-07-14 0
8 2016-07-13 0
9 2016-07-12 0
10 2016-07-11 0
11 2016-07-08 0
12 2016-07-07 0
13 2016-07-06 1
14 2016-07-05 0
15 2016-07-01 2
16 2016-06-30 1
17 2016-06-29 1
18 2016-06-28 0
19 2016-06-27 0
20 2016-06-24 1
21 2016-06-23 0
22 2016-06-22 0
23 2016-06-21 0
24 2016-06-20 0
25 2016-06-17 0
26 2016-06-16 0
27 2016-06-15 0
28 2016-06-14 1
29 2016-06-13 0
30 2016-06-10 0
31 2016-06-09 0
32 2016-06-08 0
33 2016-06-07 0
34 2016-06-06 0
35 2016-06-01 0
36 2016-05-29 0
37 2016-05-27 0
38 2016-05-26 0
39 2016-05-25 0
40 2016-05-24 0
41 2016-05-23 0
42 2016-05-20 0
Is there something obvious I am doing wrong?
Conversion from Factor to Numeric
I removed the Date column, applied data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x))))
to the rest of the data frame, and prepended the Date column back in.
dput(dtetags.df)
structure(list(Date = c("2016-07-22", "2016-07-22", "2016-07-21",
"2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19",
"2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15",
"2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13",
"2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11",
"2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08",
"2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07",
"2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05",
"2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30",
"2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29",
"2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27",
"2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23",
"2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20",
"2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16",
"2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13",
"2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09",
"2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07",
"2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01",
"2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26",
"2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23",
"2016-05-23", "2016-05-20"), `Technology` = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("Date",
"Technology"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -100L))