1

I have some data that comes from the measurement of an image where essentially the columns signify position (x) and height (z) data. The problem is that this data gets spit out as a .csv file in the wide format. I am trying to find a way to convert this to the long format but I'm unsure how to do this because I can't designate an identifier.

I know there are a lot of questions on reshaping data but I didn't find anything quite like this.

As an example:

df <- data.frame(V1 = c("Profile", "x", "[m]", 0, 2, 4, 6, 8, 10, 12, NA, NA),
                 V2 = c("1", "z", "[m]", 3, 3, 4, 10, 12, 9, 2, NA, NA),
                 V3 = c("Profile", "x", "[m]", 0, 2, 4, 6, NA, NA, NA, NA, NA),
                 V4 = c("2", "z", "[m]", 4, 8, 10, 10, NA, NA, NA, NA, NA),
                 V5 = c("Profile", "x", "[m]", 0, 2, 4, 6, 8, 10, 12, 14, 17),
                 V2 = c("3", "z", "[m]", 0, 1, 1, 10, 14, 11, 6, 2, 0))

Every two columns represents X,Z data (you can see grouped by Profile 1, Profile 2, Profile 3, etc). However, measurements are not equal lengths, hence the rows with NAs. Is there a programmatic way to reshape this data into the long form? i.e.:

profile     x     z
Profile 1   0     3
Profile 1   2     3
Profile 1   4     4
...         ...   ...
Profile 2   0     4
Profile 2   2     8
Profile 2   4     10
...         ...   ...

Thank you in advance for your help!

tlmoore
  • 125
  • 8

1 Answers1

1

You can do the following (Its a bit verbose, feel free to optimize):

  dfcols <- NCOL(df)
  
  xColInds <- seq(1,dfcols,by=2)
  zColInds <- seq(2,dfcols,by=2)
  
  longdata <- do.call("rbind",lapply(1:length(xColInds), function(i) {
    xValInd <- xColInds[i]
    zValInd <- zColInds[i]
    profileName <- paste0(df[1,xValInd]," ",df[1,zValInd])
    xVals <- as.numeric(df[-(1:3),xValInd])
    zVals <- as.numeric(df[-(1:3),zValInd])
    data.frame(profile=rep(profileName,length(xVals)),
               x = xVals,
               z = zVals)
  }))

If you want it more performant, dont cast to data.frame every single iteration. One cast at the end is enough, like:

xColInds <- seq(1,NCOL(df),by=2)
longdataList <- lapply(xColInds, function(xci) {
  list(profileName = paste0(df[1,xci]," ",df[1,xci+1]),
       x = df[-(1:3),xci],
       z = df[-(1:3),xci+1])
})
longdata <- data.frame(profile = rep(unlist(lapply(longdataList,"[[","profileName")),each=NROW(df)-3),
                       x = as.numeric(unlist(lapply(longdataList,"[[","x"))),
                       z = as.numeric(unlist(lapply(longdataList,"[[","z"))))
Jonas
  • 1,760
  • 1
  • 3
  • 12
  • Thanks Jonas! This is perfect. I have to do some analysis across multiple images and this will save a lot of headache from going through the data manually. – tlmoore Feb 05 '21 at 15:50
  • I did an edit to give you a more performant and less verbose procedure. Using microbenchmark on my machine it only takes the third of the time. Maybe it is recognizable if you are dealing with multiple data of images like you mentioned. – Jonas Feb 05 '21 at 15:58
  • Thanks again so much. I'm still trying to learn how to use functions in R (and python). I'm not a programmer so most of my solutions come through brute force. After I analyze the data I'll try to go through more carefully to understand what your code is doing. I appreciate all the time you're saving me! – tlmoore Feb 05 '21 at 16:01
  • Glad I could help. If you have to work more often with `R`, I would suggest you to read about the `apply`-variations, e.g. here : https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family . They are a really powerful looping-tool in `R`. – Jonas Feb 05 '21 at 16:09