0

Example of data set I have a dataset which contains the id number of the subjects and the year in which the data was collected (data is collected yearly). In essence, for each person, there is multiple observations over time for the lifesat variable (See image).

What I would like to do is to find a way to create the first difference of lifesat for all observations.

E.g: for individual 1, if there are observations from 2011, 2012 and 2013, I want to generate the first difference for the y variable (e.g: Change from 2011 to 2012, and change from 2012 to 2013).

In the screenshot, ID number is on the left, year is in the middle and y variable is on the right.

I tried the split, apply, combine strategy but did not yield any results. Could someone please guide me? Is there a function which allows me to do just that and how do I do this across many subjects in the data set?

Desired output would be something like this:

xwaveid year lifesat change
01 2001 7 0
01 2002 8 1
01. 2003 9 2

Reproducible data set:

data <- structure(list(xwaveid = c("0100003", "0100003", 
"0100003", "0100003", 
"0100003", "0100003", "0100003", "0100003", 
"0100003", "0100003", 
"0100003", "0100003", "0100003", "0100003", 
"0100003", "0100003", 
"0100003", "0100003", "0100003", "0100003"), year = 
c(2001, 2002, 
2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 
2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020), lifesat = c(10, 7, 7, 8, 8, 8, 8, 7, 10, 8, 7, 8, 5, 7, 8, 8, 7, 7, 7, 8)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
  • Welcome to stackoverflow. What we need is a question (already done), a minimal example data frame (is lacking, you can do it with dput(df) for example), and the addition of the desired output would be great. This is important in two ways: First, you will increase your learning curve dramatically by doing so and second, you will make us happy to be able to help! See – TarJae Jan 26 '23 at 03:09
  • You might want to check `lag` and `lead` functions. – Jonathan V. Solórzano Jan 26 '23 at 03:16
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here so people can troubleshoot your problems (rather than a table or screenshot for example). One way of doing is by using the `dput` function on the data or a subset of the data you are using, then pasting the output into your question. You can find out how to use it here: https://youtu.be/3EID3P1oisg – Shawn Hemelstrand Jan 26 '23 at 03:43

0 Answers0