I have a dataset with about 2400 rows and some rows are missing a value as you can see in the following example:
+------+------+-------+-------+--------+
| Name | Year | Value | Name2 | Value2 |
+------+------+-------+-------+--------+
| A | 2010 | 10 | | ... |
| A | 2011 | 10 | AA | ... |
| A | 2012 | 12 | AA | ... |
| A | 2013 | 14 | AA | ... |
| A | 2014 | 9 | AA | ... |
| A | 2015 | 13 | | ... |
| B | 2010 | 8 | | ... |
| B | 2011 | 10 | BB | ... |
| B | 2012 | 11 | BB | ... |
| B | 2013 | 12 | BB | ... |
| B | 2014 | 10 | | ... |
| B | 2015 | 11 | | ... |
| C | 2010 | 11 | CC | ... |
| C | 2011 | 10 | CC | ... |
| C | 2012 | 9 | CC | ... |
| C | 2013 | 8 | CC | ... |
| C | 2014 | 10 | CC | ... |
| C | 2015 | 10 | | ... |
| ... | ... | ... | ... | ... |
+------+------+-------+-------+--------+
I would like to fill up the missing values in column "Name 2" with the correct values, so it looks like this:
+------+------+-------+-------+--------+
| Name | Year | Value | Name2 | Value2 |
+------+------+-------+-------+--------+
| A | 2010 | 10 | AA | ... |
| A | 2011 | 10 | AA | ... |
| A | 2012 | 12 | AA | ... |
| A | 2013 | 14 | AA | ... |
| A | 2014 | 9 | AA | ... |
| A | 2015 | 13 | AA | ... |
| B | 2010 | 8 | BB | ... |
| B | 2011 | 10 | BB | ... |
| B | 2012 | 11 | BB | ... |
| B | 2013 | 12 | BB | ... |
| B | 2014 | 10 | BB | ... |
| B | 2015 | 11 | BB | ... |
| C | 2010 | 11 | CC | ... |
| C | 2011 | 10 | CC | ... |
| C | 2012 | 9 | CC | ... |
| C | 2013 | 8 | CC | ... |
| C | 2014 | 10 | CC | ... |
| C | 2015 | 10 | CC | ... |
| ... | ... | ... | ... | ... |
+------+------+-------+-------+--------+
I already tried the fill()
command but did not succeed. Filling the value with the latest non-NA value does not work since it would sometimes add the wrong values (e.g. B 2010 would be filled with AA)!
Could someone tell me how to do this?