1

I am working with data from logs where the data is in the following format:

    V1
1 TASK [include_vars]
2 Thursday 05 April 2018 20:21:52 -0500 (0:00:00.429) 0:00:00.429
3 TASK [include_vars]
4 Thursday 05 April 2018 20:21:53 -0500 (0:00:00.289) 0:00:00.718
5 TASK [include_vars]
6 Thursday 05 April 2018 20:21:53 -0500 (0:00:00.270) 0:00:00.988

Each timestamp corresponds to the task above it. What I need is to move each timestamp to a new column and up a row (so that it is in line with the task it corresponds to). I have tired using dcast, unstack, spread, etc. but since this is a single volumn vector, I am not sure how to make this work.

Thanks!

p.s. This data is already somewhat formatted/filtered on so I don't think there is a different approach on how I imported it - but I am open to suggestions.

Alex Dometrius
  • 812
  • 7
  • 20

2 Answers2

2

You could just bind alternate elements of the column together as separate columns...

df2 <- cbind(V1=df$V1[seq(1, nrow(df), 2)],
             V2=df$V1[seq(2, nrow(df), 2)])
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • I am not sure what this is providing me when I run it. Here is the new df that is created: 1437 1869 1437 1872 1437 1870 1758 1871 – Alex Dometrius Apr 17 '18 at 16:48
  • If `df` is a dataframe with `df$V1` being your column of alternating data, then `df2` should be a dataframe with two columns - `V1` of the odd elements of `df$V1` (the `TASK` terms), and `V2` of the even elements (the datestamps) – Andrew Gustar Apr 17 '18 at 17:22
  • That should be the case, but it isn't. Each data point in df2 becomes a number... It's strange – Alex Dometrius Apr 17 '18 at 17:34
  • Ah - it will be because they are factors. Read the data in with `stringsAsFactors=FALSE`, or set `df$V1 <- as.character(df$V1)` to make sure they are character strings. – Andrew Gustar Apr 17 '18 at 17:38
  • That's what it was. Thanks. – Alex Dometrius Apr 17 '18 at 17:42
  • Your answer definitely solved my original problem. But I have been given a new problem on the same log data and could use your help: https://stackoverflow.com/questions/49907568/spread-or-unstack-a-vector-into-multiple-columns-without-knowing-row-positions – Alex Dometrius Apr 18 '18 at 19:37
  • @AlexDometrius See answer below - your other question has been closed, but hopefully that will be of some help. – Andrew Gustar Apr 18 '18 at 22:23
0

In answer to your second question, which has been closed, so I can't post this there...

If x is your vector of log data, how about...

library(tidyverse)
df <- tibble(x=x) #convert to tibble
df <- df %>% mutate(Type=ifelse(str_detect(x,"PLAY"),     "PLAY",
                         ifelse(str_detect(x,"TASK"),     "TASK",
                         ifelse(str_detect(x,"\\d\\:\\d"),"TimeStamp",
                                                          "Other"))),
                    TaskNo=cumsum(Type=="TASK"|Type=="PLAY")) %>% 
             group_by(TaskNo) %>% 
             summarise(Play=first(x[Type=="PLAY"]),
                       Task=first(x[Type=="TASK"]),
                       TimeStamp=first(x[Type=="TimeStamp"]),
                       Other=paste(x[Type=="Other"],collapse=","))

df
# A tibble: 9 x 5
  TaskNo Play               Task                 TimeStamp          Other                                         
   <int> <chr>              <chr>                <chr>              <chr>                                         
1      1 PLAY [all]         NA                   NA                 ""                                            
2      2 NA                 TASK [validate_fact~ Thursday 05 April~ ok: [NodeA],ok: [NodeB],ok: [NodeC]           
3      3 NA                 TASK [validate_fact~ Thursday 05 April~ ""                                            
4      4 NA                 TASK [validate_fact~ Thursday 05 April~ ""                                            
5      5 NA                 TASK [validate_os_f~ Thursday 05 April~ ok: [NodeA],ok: [NodeB],ok: [NodeC]           
6      6 NA                 TASK [validate_os_f~ Thursday 05 April~ ""                                            
7      7 PLAY [k8s-cluster] NA                   NA                 ""                                            
8      8 NA                 TASK [idns/idns-set~ Thursday 05 April~ ok: [NodeA -> NodeA] => (item=idns_user) => {~
9      9 NA                 TASK [idns/idns-set~ Thursday 05 April~ ok: [NodeA],ok: [NodeB],ok: [NodeC]  
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • Thank you for the help! This works up until summarise: Error in summarise_impl(.data, dots) : Column `Play` must be length 1 (a summary value), not 8363 – Alex Dometrius Apr 19 '18 at 15:32
  • Strange - it worked for me. The only thing I can think of is that I am assuming you are starting with `x` as a character vector. Basically `df` needs to be a dataframe or tibble with one variable `df$x` which should be a character vector (your log data, one line per row) (and not a factor!). I can't see anything else that might be going wrong. – Andrew Gustar Apr 19 '18 at 15:56
  • 1
    Got it! For whatever reason converting the dataframe to a tibble isn't working for me -- could be something in my environment I am not seeing. But when left as a dataframe it works. If my other question is taken off of hold please post this there so I can mark it as the answer. Thank you again! – Alex Dometrius Apr 19 '18 at 16:23
  • Glad to hear that you've got it sorted! – Andrew Gustar Apr 19 '18 at 17:24