0

I have subsetted the data so it is easier to demonstrate what I am attempting to do. I am trying to create a data frame with a new row for the value in the column "MaxRounds". At first MaxRounds was in a column like so:

   library(dplyr);library(tidyr);library(splitstackshape)

structure(list(power = c(0.800962297001584, 0.804719517260326, 
0.808410477932415, 0.812036218849852, 0.803164810470566, 0.815597767274311
), nights = c(20L, 20L, 20L, 20L, 19L, 20L), sites = c(78L, 79L, 
80L, 81L, 81L, 82L), NonRoundedMaxRounds = c(3, 3, 3, 3, 3.15789473684211, 
3), MaxRounds = c(3, 3, 3, 3, 3, 3)), row.names = c(NA, 6L), class = "data.frame")

I then created new rows that are dependent on the MaxRounds column = creating duplicate rows dependent on the number of MaxRounds. For example, if the MaxRounds are 2, then 1-2 rows are created, if the MaxRounds are 5 then 5 rows are created).

The code creates a unique ID row name: x, x.1, x.2, x.3 etc.

data = expandRows(data, "MaxRounds")

structure(list(power = c(0.800962297001584, 0.800962297001584, 
0.800962297001584, 0.804719517260326, 0.804719517260326, 0.804719517260326
), nights = c(20L, 20L, 20L, 20L, 20L, 20L), sites = c(78L, 78L, 
78L, 79L, 79L, 79L), NonRoundedMaxRounds = c(3, 3, 3, 3, 3, 3
)), row.names = c("1", "1.1", "1.2", "2", "2.1", "2.2"), class = "data.frame")

I then created a new column based on the row names:

data$RowID = rownames(data)

structure(list(power = c(0.800962297001584, 0.800962297001584, 
0.800962297001584, 0.804719517260326, 0.804719517260326, 0.804719517260326
), nights = c(20L, 20L, 20L, 20L, 20L, 20L), sites = c(78L, 78L, 
78L, 79L, 79L, 79L), NonRoundedMaxRounds = c(3, 3, 3, 3, 3, 3
), RowID = c("1", "1.1", "1.2", "2", "2.1", "2.2")), row.names = c("1", 
"1.1", "1.2", "2", "2.1", "2.2"), class = "data.frame")

Next I am attempting to group together all of the rows that have the same x value (despite the decimal point) and number them sequentially. For example:

  • 1, 1.1, 1.2 = 1, 2, 3
  • 2, 2.1, 2.1 = 1, 2, 3

I am attempting to group by the column "RowID" using:

data %>% group_by(RowID) %>% mutate(id = row_number())

But I get this error:

enter image description here

  • `expandRows` does not produce the `x.n` when I run the code? – Serkan Jul 31 '21 at 05:20
  • So my data is all MaxRounds = 1, until row 111. Only when the MaxRounds are >1 does the x.n start. Do I need to post more data? – Leanne Greenwild Jul 31 '21 at 05:22
  • I think you should post the relevant part of your `data` for the question, and the `expandRows` output with the `x.n` values, as I do not see them. I cant reproduce your question unfortunatly. – Serkan Jul 31 '21 at 05:24
  • Your question is quite simple though, what you are looking for is just `row_number()` but I am not quite certain, as your question is, for me, rather distorted and unfocused. – Serkan Jul 31 '21 at 05:29
  • https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame – Ronak Shah Jul 31 '21 at 07:05

1 Answers1

2

Creating unique Row ID's can be done by_group or independently with dplyr, here is an example using mtcars

mtcars %>% group_by(cyl) %>% mutate(
        id = row_number()
)
# Groups:   cyl [3]
    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb    id
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1  21       6   160   110  3.9   2.62  16.5     0     1     4     4     1
2  21       6   160   110  3.9   2.88  17.0     0     1     4     4     2
3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1     1
4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1     3
5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2     1
6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1     4

And without grouping,

mtcars %>% mutate(
        id = row_number()
) 
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb id
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4  1
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4  2
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1  3
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1  4
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2  5
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1  6

row_number() numbers each rows sequentially, either by group or not. For example, the 4th row in the grouped example has id=3 as it is the 3rd row in the group of 6 cyl(inders).

Serkan
  • 1,855
  • 6
  • 20
  • If I misunderstood your question, please update your question according to the comments. – Serkan Jul 31 '21 at 05:39
  • 1
    Hi Serkan, sorry I was just playing around with the code that you provided. So the first example is exactly what I want. I am now just figuring out how to apply that to my data, as I want to use the "row names" to create the new column. – Leanne Greenwild Jul 31 '21 at 05:51
  • Good - dont forget to upvote so we now its on the right path! `dput` your final data if you cant apply it, then we can figure it out! – Serkan Jul 31 '21 at 05:55
  • I have updated my question based on your example, I seem to now be hitting an error when I attempt to group by the column RowID. Do you know what I could be doing wrong? – Leanne Greenwild Jul 31 '21 at 07:07
  • From what I can see you did nothing wrong, try renaming either data, or your grouping variable. Then it should be fine. – Serkan Jul 31 '21 at 13:55
  • Its most likely that RowID is a function, and therefore dplyr interpret it as such. Its a wild guess, I am not near my computer so I cannot test it! – Serkan Jul 31 '21 at 13:56