-1

[Solution] I found my own solution to the problem.

  require(data.table)
  dt <- data.table(dataframe)
  newdt<-dt[, number := 1:.N, by = Date] 
  data<-as.data.frame(newdt)

data_wide <- reshape(newdt, direction="wide", idvar = "Date", timevar =   "number")

data_wide

6/26/2015   209.3     230.2     80.4     s2
6/27/2015   209.1     227.2     239.2    s2

Edit 2: I think that the solution provided by others would work if I can figure out how to create a new column in my original dataframe that labels the number of the rows (or valid values in Variable 1) for each date. In other words, I would like to restart labeling rows at each change in date. For example,

6/26/2015   1   209.3
6/26/2015   2   230.2
6/26/2015   3   80.4
6/26/2015   4   s2
6/27/2015   1 ....

And then I could use the reshape methods described in the other posts.

Edit: This is close to How to reshape data from long to wide format?, but in order for those answers to fit my data I would need a new column here that assigns a number 1-length variable 1 for each day to the variables, which I don't have.

In other words, if I use

data_wide <- reshape(data,direction="wide", idvar = "Date", timevar = "Variable 1")
data_wide

Then, because there are 200+ unique entries for Variable 1, data_wide had 200+ columns for each date, with most of those being na because values for Variable 1 typically only exist on a single date in the data, and the data is a time series of over 5000 dates. ~~~~~~~~~~~~

I have a dataframe where column 1 = Date and column 2 = Variable 1 where Variable 1 is usually numeric. E.g.

6/26/2015   209.3    
6/26/2015   230.2    
6/26/2015   80.4     
6/26/2015   s2       
6/27/2015   209.1    
6/27/2015   227.2    
6/27/2015   239.2    
6/27/2015   s2       

I would like to be able to label the rows with a new value that is simply the row number for that date.

6/26/2015   209.3    1
6/26/2015   230.2    2
6/26/2015   80.4     3
6/26/2015   s2       4
6/27/2015   209.1    1
6/27/2015   227.2    2
6/27/2015   239.2    3
6/27/2015   s2       4

[Original Post]

I have a dataframe where column 1 = Date and column 2 = Variable 1 where Variable 1 is usually numeric. E.g.

6/26/2015   209.3
6/26/2015   230.2
6/26/2015   80.4
6/26/2015   s2
6/27/2015   209.1
6/27/2015   227.2
6/27/2015   239.2
6/27/2015   s2
6/28/2015   230.2
6/28/2015   228.2
6/28/2015   36.4
6/28/2015   s2
6/29/2015   209.3
6/29/2015   15.3
6/29/2015   15.4
6/29/2015   s2

I would like to be able to "transpose" the data such that each date has its own row, and each Variable for the same date is in the same row. E.g.

6/26/2015   209.3     230.2     80.4     s2
6/27/2015   209.1     227.2     239.2    s2

And so on. Although this example shows the same number of entries for Variable 1 per date, this is not always the case. I would like to be able to allow for any number of Variables to be collapsed onto the date.

A complicating factor is that there are actually two more columns, Variable 2 and Variable 3, which are constant within a date but may vary between two dates. I would like those to be collapsed onto the date as well, but I only need one column for each of these variables in the new dataframe.

I have tried using dcast and reshape functions, but these do not give the intended result. Does anyone have suggestions?

Community
  • 1
  • 1
user3795577
  • 187
  • 1
  • 18
  • 2
    Possible duplicate of [Reshape data from long to wide format R](http://stackoverflow.com/questions/5890584/reshape-data-from-long-to-wide-format-r) – jeremycg Nov 18 '15 at 18:45
  • @jeremycg that is close, except I do not have a column for 'numbers' as that user had. In my case, I would basically need to loop through the dataset and assign each entry for Variable 1 per date a number starting at 1-length(variable 1) per date in order to use that answer. Your answer also does not address the fact that I do not want to create a new column for every entry of variable 2 and 3 on the same date. – user3795577 Nov 18 '15 at 19:01
  • "a number 1-length variable 1" - this text makes no sense to me. – Alex Brown Nov 18 '15 at 19:05
  • I would basically need to loop through the dataset and number the number of Variable 1 entries per date in order to use that answer. – user3795577 Nov 18 '15 at 19:07
  • number the number of a variable? please try to use programming terms when describing your problem, for instance 'count the number of matching instances for each date'. – Alex Brown Nov 18 '15 at 19:11
  • If you want to count the values, then your output example: `6/26/2015 209.3 230.2 80.4 s2` is wrong. Please update to show the expected output. Also the word transpose is not longer appropriate - this is an aggregation operation. – Alex Brown Nov 18 '15 at 19:28
  • I want to count them but I don't want to just use the 'counts' value. I would like a new column to label each row with the row number within that date range. – user3795577 Nov 18 '15 at 19:29

2 Answers2

1

this is best handled via tapply: something like

tapply(data$`Variable 1`, data$date, c)

which creates a ragged array. The type of a ragged array matches your description of the expected result. Note that the original order might be lost, but you can restore a sensible order by (e.g.) ordering by date.

you CANNOT (sensibly) transform your result into a legal data frame, because your data type implies a variable number of columns per row. Data frames are not intended for this use, and if you approach it this way, you will immediately run into more problems.


alternatively, what's wrong with the original sparse matrix with lots of NAs? that's another valid representation of the data type you are discussing.


If you are just interested in the count of valid values, just do this:

aggregate(data=data,`Variable 1`~Date,length)

For the mtcars dataset, this happens:

aggregate(data=mtcars,wt~cyl,length)
  cyl wt
1   4 11
2   6  7
3   8 14

Note that wt is just a count of values, it doesn't care about the type or value of wt, just how many (length) there are.

Alex Brown
  • 41,819
  • 10
  • 94
  • 108
  • Nothing is 'wrong' with it. What I am trying to do is wind up with a new dataframe that I can use in a time series analysis and it would be GREAT if I could just simply condense the dataframe into a new dataframe where each date has its own row and then 10 columns MAX which have the values in Variable 1. – user3795577 Nov 18 '15 at 19:15
  • okay, I can work with that. please update your problem specification to indicate that you want to have between 1 and 10 valid values, and there will always be 10 columns. You can't* have multiple columns INSIDE the value of one 'variable' (column) called 'Variable 1'. Also, please rename your column to something like 'Value1'. It's not a variable (it's a list index), and using the word variable to describe it is making it difficult to reason about and discuss. * You can, but please don't. – Alex Brown Nov 18 '15 at 19:17
  • If you only want to count the valid values, then please clearly say so. (That's easy). – Alex Brown Nov 18 '15 at 19:20
  • note that reshape2 can do this very well, but since it is fragile I don't care to attempt it without sample data. – Alex Brown Nov 18 '15 at 19:27
  • I do not want to simply count the number of valid values, I want to label them (in a new column) as 1, 2, 3, 4, 5, and restart when the date changes. – user3795577 Nov 18 '15 at 19:41
0

This solution takes a dataframe where column 1 contains date values which are repeated for several rows, and where column 2 contains a value for each row. The goal is to have a new dataframe where column 1 contains no repeat dates, and each row (date) contains all of the values listed in column 2, essentially condensing dates and transposing column 2 into the date. To do this with reshape, first, each value for the same date needs a number.

   require(data.table)
      dt <- data.table(dataframe)
      newdt<-dt[, number := 1:.N, by = Date] 
      data<-as.data.frame(newdt)

    data_wide <- reshape(newdt, direction="wide", idvar = "Date", timevar =   "number")

    data_wide

    6/26/2015   209.3     230.2     80.4     s2
    6/27/2015   209.1     227.2     239.2    s2
user3795577
  • 187
  • 1
  • 18