1

I have data in the following format:

datetime <- c('2016-01-01 00:00:00','2016-01-01 00:01:00', '2016-01-01 00:02:00',....)
D1 <- c(1, 1, 0, ...)
D2 <- c(0, 1, 0, ...)
D3 <- c(1, 0, 0, ...)
df <- data.frame(datetime,D1,D2,D3)

Basically, I have several columns with 1 & 0 at successive time stamps.

TO DO:

I want to generate a new dataframe with 2 columns: the dateTime stamp, and a single column with the names of all columns which are non-zero at that time.

TRIED SO FAR:

I can use reshape to convert the first dataframe from long to wide format, then remove rows with 0.

molten <- melt(df, id = c("datetime"))
new <- subset(molten, molten$value > 0)

QUERY:

Now, how to convert it into the following format?

dateTime <- c('2016-01-01 00:00:00','2016-01-01 00:01:00', '2016-01-01 00:02:00',....)
D <- c('D1,D3' , 'D1,D2', '', ...)
new_df <- data.frame(dateTime,D)

And how to avoid loss of time stamp information that occurs upon subsetting ? (notice that the 3rd timestamp has an NA value as D1,D2 & D3 are all '0' at that time)

Sree
  • 77
  • 1
  • 8
  • Related: [*Making a character string with column names with zero values*](https://stackoverflow.com/q/37734904/2204410) – Jaap Feb 06 '18 at 11:47

1 Answers1

3

Here's a possible solution :

cn <- colnames(df)[-1]
new_df <- cbind(df['datetime'],
                D=apply( df[,-1] != 0, 1 , function(x) paste(cn[x],collapse=',')))

> new_df
             dateTime     D
1 2016-01-01 00:00:00 D1,D3
2 2016-01-01 00:01:00 D1,D2
3 2016-01-01 00:02:00 

Explanation :

  • we store the colnames (excluding dateTime) into a vector called cn
  • df[,-1] != 0 returns a matrix of TRUE/FALSE being TRUE where different from zero
  • using apply for each row of the previous matrix we subset cn using the values TRUE/FALSE and we collapse the resulting colnames into one string
  • then we bind the previous data.frame (only the dateTime column actually) with the new values into a column called D
digEmAll
  • 56,430
  • 9
  • 115
  • 140