0

I have the following sample data. The last column is a combo of the first three columns separated by a '-'. The "Time" column will always be in an increasing order for each unique "Group" defined in the last column. The data is sorted by "Line", "Blk", "Time".

I'm trying to append two columns to this dataset.

   Line    Blk  Trp    Time       Group
1     1   1494  826   32280  1-1494-826
2     1   1494  865   32340  1-1494-865
3     1   1494  865   32640  1-1494-865
4     1   1494  865   32940  1-1494-865
5     1   1494  865   33180  1-1494-865
6     1   1494  865   33780  1-1494-865
7     1   1494  829   34080  1-1494-829
8     1   1494  829   34800  1-1494-829
9     1   1494  829   35280  1-1494-829
10    1   1494  829   35580  1-1494-829
11    1   1494  829   35880  1-1494-829
12    2   1495  926   32280  2-1495-926
13    2   1495  965   32340  2-1495-965
14    2   1495  965   32640  2-1495-965
15    2   1495  965   32940  2-1495-965
16    2   1495  965   33180  2-1495-965
17    2   1495  965   33780  2-1495-965
18    2   1495  929   34080  2-1495-929
19    2   1495  929   34800  2-1495-929
20    2   1495  929   35280  2-1495-929
21    2   1495  929   35580  2-1495-929

The output after appending the two columns is shown below.

Column "Sqn" always starts at 1 for each unique "Group" and increases by 1 when the "Time" increases in that "Group".

Column "Rtn" will have only two values (D1,D2). The first time I encounter a new "Line" number, this value is D1 for the first unique "Group". Now, within that specific "Line", this value alternates to D2 and back to D1 and so on for each new "Group".

   Line    Blk  Trp    Time       Group  Sqn  Rtn
1     1   1494  826   32280  1-1494-826    1   D1
2     1   1494  865   32340  1-1494-865    1   D2
3     1   1494  865   32640  1-1494-865    2   D2
4     1   1494  865   32940  1-1494-865    3   D2
5     1   1494  865   33180  1-1494-865    4   D2
6     1   1494  865   33780  1-1494-865    5   D2
7     1   1494  829   34080  1-1494-829    1   D1
8     1   1494  829   34800  1-1494-829    2   D1
9     1   1494  829   35280  1-1494-829    3   D1
10    1   1494  829   35580  1-1494-829    4   D1
11    1   1494  829   35880  1-1494-829    5   D1
12    2   1495  926   32280  2-1495-926    1   D1
13    2   1495  965   32340  2-1495-965    1   D2
14    2   1495  965   32640  2-1495-965    2   D2
15    2   1495  965   32940  2-1495-965    3   D2
16    2   1495  965   33180  2-1495-965    4   D2
17    2   1495  965   33780  2-1495-965    5   D2
18    2   1495  929   34080  2-1495-929    1   D1
19    2   1495  929   34800  2-1495-929    2   D1
20    2   1495  929   35280  2-1495-929    3   D1
21    2   1495  929   35580  2-1495-929    4   D1

I did the "Sqn" column using for loop and quickly realized it's highly inefficient. I tried to see if I can use diff somehow, but got stuck because "Group" column is not an integer.

Any efficient solution for adding the two columns would be appreciated.

Sujith
  • 15
  • 5
  • The first question is answered here: [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) – Henrik Nov 28 '18 at 23:44

1 Answers1

1

You can use dplyr for this.

To add Sqn, group_by Group and add the row_number().

To add Rtn: I summed Sqn every time its value equaled 1, then tested whether the result was even. There's probably a smarter way.

Assuming the data frame is df1:

library(dplyr)
df1 %>% 
  group_by(Group) %>% 
  mutate(Sqn = row_number()) %>% 
  ungroup() %>% 
  mutate(Rtn = cumsum(Sqn == 1), 
         Rtn = ifelse(Rtn %% 2 == 0, "D2", "D1"))

Result:

# A tibble: 21 x 7
    Line   Blk   Trp  Time Group        Sqn Rtn  
   <int> <int> <int> <int> <chr>      <int> <chr>
 1     1  1494   826 32280 1-1494-826     1 D1   
 2     1  1494   865 32340 1-1494-865     1 D2   
 3     1  1494   865 32640 1-1494-865     2 D2   
 4     1  1494   865 32940 1-1494-865     3 D2   
 5     1  1494   865 33180 1-1494-865     4 D2   
 6     1  1494   865 33780 1-1494-865     5 D2   
 7     1  1494   829 34080 1-1494-829     1 D1   
 8     1  1494   829 34800 1-1494-829     2 D1   
 9     1  1494   829 35280 1-1494-829     3 D1   
10     1  1494   829 35580 1-1494-829     4 D1   
# ... with 11 more rows
neilfws
  • 32,751
  • 5
  • 50
  • 63