1

I have a data frame (df) that I want to subset according to the value of the column t. In my pipeline this is done in a loop that allows me to process at each repeat of the loop only the part of the data frame that has a certain t. Here is a part of the data frame:

df

        t       d      avrg        se       s_n
105 4.034 574.383  533.3125 15.842750 0.5742241
106 4.034 579.906  526.2601 16.520519 0.5666307
107 4.034 585.429  517.3978 16.603408 0.5570885
108 4.034 590.951  514.8851 16.378100 0.5543831
109 4.034 596.474  517.5682 16.031580 0.5572721
110 4.034 601.997  524.1770 16.301832 0.5643879
111 4.034 607.520  521.1787 16.773292 0.5611595
112 4.034 613.043  511.4275 17.079401 0.5506602
113 4.034 618.566  506.8916 16.757593 0.5457765
114 4.034 624.089  511.3979 17.165346 0.5506284
115 4.034 629.612  511.7480 17.175872 0.5510053
116 4.034 635.135  509.7872 17.862666 0.5488941
117 4.034 640.658  507.4556 19.080856 0.5463837
118 4.244   0.000  984.5679  1.842083 1.0600964
119 4.244   5.523 1040.4532  4.488659 1.1202687
120 4.244  11.046 1284.3719 24.832460 1.3828990
121 4.244  16.569 1503.8378 49.605517 1.6192007
122 4.244  22.092 1558.5444 49.223158 1.6781039
123 4.244  27.615 1631.0177 36.870109 1.7561368
124 4.244  33.137 1741.2543 30.006613 1.8748300
125 4.244  38.660 1872.4405 37.725207 2.0160797

in order to get the levels of t I do this:

  times<-as.numeric(levels(as.factor(df$t)))

Then, as I loop, I subset my data frame like this:

for (j in times) {

df_t <- df[df$t==j,]

*and here all the things I need to do with df_t**

}

I noticed that the subsetting works well for some values of t but not for others. For example:

> df_t <- df[df$t==times[1],]
> df_t
        t       d     avrg       se       s_n
105 4.034 574.383 533.3125 15.84275 0.5742241
106 4.034 579.906 526.2601 16.52052 0.5666307
107 4.034 585.429 517.3978 16.60341 0.5570885
108 4.034 590.951 514.8851 16.37810 0.5543831
109 4.034 596.474 517.5682 16.03158 0.5572721
110 4.034 601.997 524.1770 16.30183 0.5643879
111 4.034 607.520 521.1787 16.77329 0.5611595
112 4.034 613.043 511.4275 17.07940 0.5506602
113 4.034 618.566 506.8916 16.75759 0.5457765
114 4.034 624.089 511.3979 17.16535 0.5506284
115 4.034 629.612 511.7480 17.17587 0.5510053
116 4.034 635.135 509.7872 17.86267 0.5488941
117 4.034 640.658 507.4556 19.08086 0.5463837

> df_t1 <- df[df$t==times[2],]
> df_t1
[1] t    d    avrg se   s_n 
<0 rows> (or 0-length row.names)

Why does the subsetting work with some values of my levels and not with others? If I check them manually, the values seem to be both correct and my data frame clearly has those values in the t column...

    > times[1]
    [1] 4.034
    > times[2]
    [1] 4.244

I also tried other ways of subsetting, like this:

> subset.data.frame(df, df$t==times[2])
[1] t    d    avrg se   s_n 
<0 rows> (or 0-length row.names)


> subset.data.frame(df, df$t==times[1])
        t       d     avrg       se       s_n
105 4.034 574.383 533.3125 15.84275 0.5742241
106 4.034 579.906 526.2601 16.52052 0.5666307
107 4.034 585.429 517.3978 16.60341 0.5570885
108 4.034 590.951 514.8851 16.37810 0.5543831
109 4.034 596.474 517.5682 16.03158 0.5572721
110 4.034 601.997 524.1770 16.30183 0.5643879
111 4.034 607.520 521.1787 16.77329 0.5611595
112 4.034 613.043 511.4275 17.07940 0.5506602
113 4.034 618.566 506.8916 16.75759 0.5457765
114 4.034 624.089 511.3979 17.16535 0.5506284
115 4.034 629.612 511.7480 17.17587 0.5510053
116 4.034 635.135 509.7872 17.86267 0.5488941
117 4.034 640.658 507.4556 19.08086 0.5463837
> 

But as you can see the subsetting still works with one value and not with the other. Do you have any suggestion on how to solve this problem?

UPDATE1

As suggested in the comments, using

times <- unique(dt$t)

instead of my first method, works well and seem to solve the problem for now.

UPDATE2

Following some comments, here I try to provide a reproducible form of my df

> dput(df)
structure(list(t = c(4.034, 4.034, 4.034, 4.034, 4.034, 4.034, 
4.034, 4.034, 4.034, 4.034, 4.034, 4.034, 4.034, 4.244, 4.244, 
4.244, 4.244, 4.244, 4.244, 4.244, 4.244), d = c(574.383, 579.906, 
585.429, 590.951, 596.474, 601.997, 607.52, 613.043, 618.566, 
624.089, 629.612, 635.135, 640.658, 0, 5.523, 11.046, 16.569, 
22.092, 27.615, 33.137, 38.66), avrg = c(533.312475247525, 526.260069306931, 
517.397752475248, 514.885089108911, 517.568217821782, 524.17702970297, 
521.178702970297, 511.427475247525, 506.891643564356, 511.397861386139, 
511.74796039604, 509.787158415842, 507.455584158416, 984.567900990099, 
1040.45316831683, 1284.37189108911, 1503.83781188119, 1558.54437623762, 
1631.01772277228, 1741.25434653465, 1872.44046534653), se = c(15.8427501449439, 
16.5205192226773, 16.6034079506853, 16.3780996947454, 16.0315801572497, 
16.3018319583687, 16.7732924683709, 17.0794011397917, 16.7575928861432, 
17.1653457253679, 17.1758716618221, 17.8626655326283, 19.0808563725021, 
1.84208337262486, 4.48865895211631, 24.8324597734051, 49.6055165744209, 
49.2231582153052, 36.8701085501606, 30.0066129040664, 37.7252068402058
), s_n = c(0.574224096797455, 0.566630684643339, 0.557088519188004, 
0.554383103659479, 0.557272061321729, 0.564387850300072, 0.561159515055948, 
0.550660248318982, 0.545776462597893, 0.550628362710457, 0.551005318617323, 
0.548894099025929, 0.546383664366651, 1.06009635986746, 1.12026871405828, 
1.38289900076007, 1.61920065503164, 1.67810388524744, 1.75613682819789, 
1.8748299558705, 2.01607966234353)), .Names = c("t", "d", "avrg", 
"se", "s_n"), row.names = 105:125, class = "data.frame")
> 
  • Does it work when you define "times" as `times<-unique(df$t)`? – otwtm Apr 26 '20 at 13:53
  • Please provide the input in reproducible form using dput.. See the instructions for posting at the top fo the [tag:r] tag page. – G. Grothendieck Apr 26 '20 at 14:14
  • Welcome to SO and R and your detailed question. It would help if you put your data into a dataframe format e.g. df <- data.frame(a = c(1, 2, 3), b = c("a", "b", "c")) obviously with your own data! This makes the question reproducible. This link is helpful [reprex] – Peter Apr 26 '20 at 14:15
  • Floating point comparisons are not accurate. Read https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal – Ronak Shah Apr 26 '20 at 14:22
  • The suggestion on otwtm of using `times<-unique(df$t) ` works well, thank you very much! But why would this work and the other way no? – Annalisa Bellandi Apr 26 '20 at 14:44

0 Answers0