0

I have a data frame that looks like this...(the short version)

    dat <- data.frame(matrix(NA, nrow = 105, ncol = 2))
    colnames(dat) <- c("1","2")
    dat[,1] <- c("HeaderStart","LevelName","LevelName","LevelName","LevelName","LevelName","Experiment","SessionTime","Subject","DataFileBasename",        
                 "Group","HeaderEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd","LogFrameStart","TrialList","Running","TrialListSample","PlaceBetDEVICE","PlaceBetOnsetTime","PlaceBetRTTime",
                 "PlaceBetRT","PlaceBetCRESP","Result","DiceRollOnsetDelay","DiceRollDurationError","DiceRollACC","DiceRollRESP",
                 "DiceRollOnsetToOnsetTime","Level","Procedure","RollMovie","TrialListCycle","StartBalance","PlaceBetOnsetDelay",
                 "PlaceBetDurationError","PlaceBetACC","PlaceBetRESP","PlaceBetOnsetToOnsetTime","EndBalance","DiceRollOnsetTime","DiceRollRTTime",          
                 "DiceRollRT","DiceRollCRESP","LogFrameEnd")              
    dat[,2] <- c("HeaderStart","Session","Trial","LogLevel5","LogLevel7","LogLevel9","GameOfDice_CATCH","10:39:59","999","GameOfDice_CATCH-999-1",
                 "1","HeaderEnd","LogFrameStart","5","TrialList","1","Button","199369","231578","32209","","200","367","-999999","0","","0","3",                     
                 "TrialProc","Two","1","1200","66","-999999","0","TwoThreeFourFive","0","1300","241869","0","0","","LogFrameEnd","LogFrameStart",         
                 "4","TrialList","3","Button","246519","248704","2185","","500","281","-999999","0","","0","3","TrialProc","Two","1",                     
                 "1800","117","-999999","0","ThreeFourFiveSix","0","1700","264386","0","0","","LogFrameEnd","LogFrameStart","5",                     
                 "TrialList","5","Button","269069","272355","3286","","1000","285","-999999","0","","0","3","TrialProc","Five","1","2700",                  
                 "84","-999999","0","OneTwoThree","0","2500","282436","0","0","","LogFrameEnd")

[Original Data]1

How can I grab all of the data in between the "LogFrameStart" and "LogFrameEnd" and place it into a new data frame to look like this?...

[Expected Output]2

Edit/Answer:

I ended up writing a for loop instead which solved the problem

c=1
for (row in 1:nrow(df)){
  if (df[row,'1']=='LogFrameStart'){
    sample_data = df[(row+1):(row+29),]
  
  if (c==1){
    newdata = sample_data
    c=c+1
  } else { newdata = cbind(newdata,sample_data[,2])}
  }
}
jc2525
  • 141
  • 10
  • 2
    Please don’t use images of data as they cannot be used without a lot of unnecessary effort. [For multiple reasons](//meta.stackoverflow.com/q/285551) Questions should be reproducible. This makes it easy for other who may want to help, copy data easily. Check out stack overflow guidance [mre] and [ask]. Include a minimal dataset in the form of an object for example if a data frame as `df <- data.frame(…)` where … are your variables and values or use `dput(head(df))`. [Good overview on asking questions](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Peter Oct 17 '22 at 20:28
  • I added sample data to reproduce the short version of my data frame. Could you please take another look at it? – jc2525 Oct 18 '22 at 15:46

1 Answers1

1

This is one approach using dplyr and tidyr packages.
A list may be a better way to manage this data, I suppose it depends what you intend doing with it next.
This approach separates the data into two data frames: I assume meta data and trial data.
For convenience stores the two data frames in a list.
Finally a new data frame with the data as your expected output...

library(dplyr, warn = FALSE)
library(tidyr)

# initialise an empty list
dat_ls <- vector("list", length = 2)

# meta data as a data frame in the first list element
dat_ls[[1]] <- 
  slice_head(dat, n = 13) |> 
  rename(col1 = `1`, col2 = `2`)

# trial list data frame in the second list element
dat_ls[[2]] <- dat |> 
  filter(row_number() > 13) |> 
    rename(col1 = `1`, col2 = `2`) |> 
    mutate(trial_id = ifelse(col1 == "TrialList", col2, NA_real_)) |> 
    fill(trial_id) |> 
    mutate(trial_id = as.numeric(trial_id),
           trial_id = c(FALSE, trial_id[-length(trial_id)] != trial_id[-1]),
           #gives each group a unique id and prepares for bind_rows with meta data
           trial_id = paste0("col", cumsum(trial_id) + 2))  |>
    pivot_wider(names_from = trial_id, values_from = col2)
  

# To combine the data frames into one and remove NAs:

df_new <- 
  bind_rows(dat_ls[[1]], dat_ls[[2]]) |> 
  mutate(across(everything(), ~ifelse(is.na(.x), "", .x)))

df_new
#>                        col1                   col2             col3        col4
#> 1               HeaderStart            HeaderStart                             
#> 2                 LevelName                Session                             
#> 3                 LevelName                  Trial                             
#> 4                 LevelName              LogLevel5                             
#> 5                 LevelName              LogLevel7                             
#> 6                 LevelName              LogLevel9                             
#> 7                Experiment       GameOfDice_CATCH                             
#> 8               SessionTime               10:39:59                             
#> 9                   Subject                    999                             
#> 10         DataFileBasename GameOfDice_CATCH-999-1                             
#> 11                    Group                      1                             
#> 12                HeaderEnd              HeaderEnd                             
#> 13            LogFrameStart          LogFrameStart                             
#> 14                TrialList                      5                4           5
#> 15                  Running              TrialList        TrialList   TrialList
#> 16          TrialListSample                      1                3           5
#> 17           PlaceBetDEVICE                 Button           Button      Button
#> 18        PlaceBetOnsetTime                 199369           246519      269069
#> 19           PlaceBetRTTime                 231578           248704      272355
#> 20               PlaceBetRT                  32209             2185        3286
#> 21            PlaceBetCRESP                                                    
#> 22                   Result                    200              500        1000
#> 23       DiceRollOnsetDelay                    367              281         285
#> 24    DiceRollDurationError                -999999          -999999     -999999
#> 25              DiceRollACC                      0                0           0
#> 26             DiceRollRESP                                                    
#> 27 DiceRollOnsetToOnsetTime                      0                0           0
#> 28                    Level                      3                3           3
#> 29                Procedure              TrialProc        TrialProc   TrialProc
#> 30                RollMovie                    Two              Two        Five
#> 31           TrialListCycle                      1                1           1
#> 32             StartBalance                   1200             1800        2700
#> 33       PlaceBetOnsetDelay                     66              117          84
#> 34    PlaceBetDurationError                -999999          -999999     -999999
#> 35              PlaceBetACC                      0                0           0
#> 36             PlaceBetRESP       TwoThreeFourFive ThreeFourFiveSix OneTwoThree
#> 37 PlaceBetOnsetToOnsetTime                      0                0           0
#> 38               EndBalance                   1300             1700        2500
#> 39        DiceRollOnsetTime                 241869           264386      282436
#> 40           DiceRollRTTime                      0                0           0
#> 41               DiceRollRT                      0                0           0
#> 42            DiceRollCRESP                                                    
#> 43              LogFrameEnd            LogFrameEnd      LogFrameEnd LogFrameEnd
#> 44            LogFrameStart          LogFrameStart    LogFrameStart

Created on 2022-10-18 with reprex v2.0.2

Peter
  • 11,500
  • 5
  • 21
  • 31
  • What is "|>"? I keep getting an error when trying to run the script. I tried "%>%" in its place but it only works for some lines – jc2525 Oct 18 '22 at 18:04
  • `|>` is the forward pipe operator in base R. For details see help ?`|>`. From the documentation: "A pipe expression passes, or pipes, the result of the left-hand side expression lhs to the right-hand side expression rhs." It was introduced in R version 4.1.0. See this SO question for details; https://stackoverflow.com/questions/65329335/how-to-pipe-purely-in-base-r-base-pipe. You would have to include information about what the error is where `|>` fails. The {magrittr} pipe `%>%` should work equally. In RStudio the short cut key is Cntl+Shft+m. – Peter Oct 18 '22 at 18:13
  • `Error: unexpected '>'` and `Error in rename(col1 = `1`, col2 = `2`) : object '1' not found` this is for this line `dat_ls[[1]] <- slice_head(dat, n = 13) |> rename(col1 = `1`, col2 = `2`)` – jc2525 Oct 18 '22 at 18:40
  • Have you copied the code directly from the answer or are you typing it fresh? Are you using the datasets as posted in the question? Note the original column names need to be wrapped in back ticks: ` as they are invalid variable names – Peter Oct 18 '22 at 18:47
  • I copied and pasted your code. Then I changed "|>" to "%>%" and ran it in portions. It worked up to and including the pivot line. But gave me a warning message after the pivot line...Warning message: `Values are not uniquely identified; output will contain list-cols. * Use `values_fn = list` to suppress this warning. * Use `values_fn = length` to identify where the duplicates arise * Use `values_fn = {summary_fun}` to summarise duplicates` – jc2525 Oct 18 '22 at 18:52
  • I've just copied the data and the answer into a fresh version of R and it works with no problems regardless of using either pipe operator: `|>` or `%>%`. The error message suggests that the three line mutate above the pivot_wider function might not be exactly as in the answer. I can only suggest trying again and run the whole script in one go. What version of R are you using? Are you using RStudio? – Peter Oct 18 '22 at 19:03
  • Okay, thanks. I edited the question to add my solution using a for loop. – jc2525 Oct 18 '22 at 19:22