Data manipulations in R

Question

As part of a project, I am currently using R to analyze some data. I am currently stuck with the retrieving few values from the existing dataset which i have imported from a csv file.

The file looks like:

enter image description here

For my analysis, I wanted to create another column which is the subtraction of the current value of x and its previous value. But the first value of every unique i, x would be the same value as it is currently. I am new to R and i was trying various ways for sometime now but still not able to figure out a way to do so. Request your suggestions in the approach that I can follow to achieve this task.

Mydata structure

structure(list(t = 1:10, x = c(34450L, 34469L, 34470L, 34483L, 
34488L, 34512L, 34530L, 34553L, 34575L, 34589L), y = c(268880.73342868, 
268902.322359863, 268938.194698248, 268553.521856105, 269175.38273083, 
268901.619719038, 268920.864512966, 269636.604121984, 270191.206593437, 
269295.344751692), i = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L)), .Names = c("t", "x", "y", "i"), row.names = c(NA, 10L), class = "data.frame")

Please consider reading up on [ask] and how to create a [reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It makes it easier for others to help you. At the very least, include your data in a format that's easily imported, your desired output and what you've already tried yourself. — Heroka, Nov 18 '15 at 13:05
you want the computation made for each unique `i` but in your example data, there is only one unique value of `i`... — Cath, Nov 18 '15 at 13:21
MyData is a dataset of 24000 rows where i values varies from 1 to 10. for each value of i we have 2400 rows. like 24 hours of data for 100days. — vijay krishna, Nov 18 '15 at 13:28

Cath · Accepted Answer · 2015-11-18T13:32:48.373

You can use the package data.table to obtain what you want:

library(data.table)
setDT(MyData)[, x_diff := c(x[1], diff(x)), by=i]
MyData
     # t     x i x_diff
 # 1:  1 34287 1  34287
 # 2:  2 34789 1    502
 # 3:  3 34409 1   -380
 # 4:  4 34883 1    474
 # 5:  5 34941 1     58
 # 6:  6 34045 2  34045
 # 7:  7 34528 2    483
 # 8:  8 34893 2    365
 # 9:  9 34551 2   -342
# 10: 10 34457 2    -94

Data:

set.seed(123)
MyData <- data.frame(t=1:10, x=sample(34000:35000, 10, replace=T), i=rep(1:2, e=5))

s_scolary · Answer 2 · 2015-11-18T13:29:36.850

You can use the diff() function. If you want to add a new column to your existing data frame, the diff function will return a vector x-1 length of your current data frame though. so in your case you can try this:

# if your data frame is called MyData 
MyData$newX = c(NA,diff(MyData$x))

That should input an NA value as the first entry in your new column and the remaining values will be the difference between sequential values in your "x" column

UPDATE:

You can create a simple loop by subsetting through every unique instance of "i" and then calculating the difference between your x values

# initialize a new dataframe
newdf = NULL
values = unique(MyData$i)
for(i in 1:length(values)){
  data1 = MyData[MyData$i = values[i],]
  data1$newX = c(NA,diff(data1$x))
  newdata = rbind(newdata,data1)
}

# and then if you want to overwrite newdf to your original dataframe
MyData = newdf

# remove some variables
rm(data1,newdf,values)

Thank you Colin, I am working on the given suggestion by you. I will update the status in sometime. Thank you — vijay krishna, Nov 18 '15 at 13:59

Data manipulations in R

2 Answers2