0

I am trying to find a quick way of finding a range of Y-values across a set of dataframes that I have. Here is what my dataframes look like:

#Example dataframe1

X      Y
0      0
0.2    27    
0.4    31
0.6    34
0.8    39
1.0    43
1.2    44

#Example dataframe2

X      Y
0      0
0.2    12
0.3    14
0.4    18
0.5    21
0.6    22
0.7    24
0.8    29
0.9    34
1.0    38
1.1    40

#Example dataframe3

X      Y
0      0
0.2    19
0.3    24
0.4    29
0.5    34
0.6    38
0.7    39
0.8    47
0.9    51
1.0    52
1.1    56

X1   0.00000000  0.00000000
X2   0.00644777 -3.74942549
X3   0.01250601 -1.64198832
X4   0.01746194 -3.09841231
X5   0.02656210 -3.24436861
X6   0.03360072  1.92721788
X7   0.04399755 -4.38984054
X8   0.05404601 -1.07421099
X9   0.06154399  1.05752545
X10  0.07105423 -2.64811489
X11  0.08315538 -2.10245766
X12  0.08925389 -1.13184408
X13  0.10060635  0.51200462
X14  0.10833009  4.54644990
X15  0.12196157 -3.08377282
X16  0.13730928 -0.27015071
X17  0.14890850 -0.65513113
X18  0.16087072  2.17034586
X19  0.17361501 -0.93398437
X20  0.18310140  0.79111051
X21  0.19810085 -3.87227906
X22  0.21316248  2.13154747
X23  0.22423607  1.36416314
X24  0.23529990  4.99793424
X25  0.24695577 -1.01918599
X26  0.26119599  2.93076793
X27  0.27643532  2.91155218
X28  0.28517213  3.57100964
X29  0.29978970 -0.73012874
X30  0.31870887 -2.78233188
X31  0.33418021 -1.51922907
X32  0.34585044  2.87420871
X33  0.36071229  1.14786764
X34  0.37379062  2.99504234
X35  0.39324498  0.63168705
X36  0.41099992 -0.07084714
X37  0.42437476  1.54668167
X38  0.43643925  2.81198973
X39  0.44687960  4.92643076
X40  0.45864752  4.11311344
X41  0.48055932  4.39500995
X42  0.49634662  7.15432395
X43  0.50779009  8.58246880
X44  0.52861303  2.82337466
X45  0.54971600  2.99739656
X46  0.56138206  6.78068642
X47  0.57472491  4.45406406
X48  0.59202033  2.46356370
X49  0.60975420  3.15161790
X50  0.63057494  2.34243629
X51  0.64653736  2.56960396
X52  0.66088217 11.82388880
X53  0.68168259  6.09572345
X54  0.69649595  5.09381202
X55  0.71363837  4.46075825
X56  0.72988760  8.62040355
X57  0.75010443  6.83203133
X58  0.76776582  9.18985022
X59  0.78613955  7.86350629
X60  0.81099886  6.03907061
X61  0.82979667  9.34560816
X62  0.84431553  9.42604561
X63  0.86824787  7.64843057
X64  0.88447672  6.16947933
X65  0.89909577  9.87215457
X66  0.92162514 10.98250589
X67  0.94854647  5.26614443
X68  0.96631038  8.94057689
X69  0.98120815 14.91703428
X70  1.00357985 11.66179411
X71  1.02769291  8.30042153
X72  1.04733002  7.64827230
X73  1.06585240 10.99394148
X74  1.09083056  7.14363152
X75  1.11592567  5.93705238
X76  1.13627303 10.89649169
X77  1.15866983  8.08089986
X78  1.17941856 10.63699550
X79  1.19748724 12.61902305
X80  1.22012913 11.83730809
X81  1.24142718  9.53528978
X82  1.26937973  9.20448095
X83  1.29283488  6.75640546
X84  1.31011736 15.04334314
X85  1.33134723 12.18910484
X86  1.36014366 11.80276542
X87  1.38395822 14.18290101
X88  1.40615380 13.99689728
X89  1.42895305 12.04541339
X90  1.45046842 10.61135555
X91  1.47504234 11.07833983
X92  1.49920797 12.37669790
X93  1.52142668 14.35112239
X94  1.54559219 16.42659881
X95  1.57354927 13.69037310
X96  1.60000896 13.51107220
X97  1.62449229 13.96191530
X98  1.64506662 16.74995717
X99  1.67100036 14.67649138
X100 1.70185161 10.71288999
X101 1.72964287 14.56576519
X102 1.75235510  8.66232995
X103 1.77455521 18.00492491
X104 1.80183053 18.36635863
X105 1.83121216 12.63300200
X106 1.85619974 15.88991904
X107 1.88188243 13.53745701
X108 1.91002214 18.05483899
X109 1.93333006 18.40951264
X110 1.96133912 15.26804166
X111 1.98944890 15.16029048
X112 2.01225924 19.62616569
X113 2.03804183 19.73062246
X114 2.06614304 15.57687649
X115 2.09558916 19.55369508
X116 2.12400699 17.90633921
X117 2.14846969 21.81952636
X118 2.18011498 16.62377098
X119 2.20532560 18.47341439
X120 2.23031592 25.90875455
X121 2.26106715 16.58422332
X122 2.28796983 23.44278876
X123 2.31954741 19.14855256
X124 2.35274339 18.75923949
X125 2.37794161 20.01523111
X126 2.40643740 24.42732689
X127 2.43963051 21.06897103
X128 2.46948838 16.13112984
X129 2.49733877 24.40617463
X130 2.53035021 20.97045518
X131 2.56229091 18.55134741
X132 2.59265327 20.27110860
X133 2.61934137 21.31728543
X134 2.64592481 21.69982327
X135 2.67754769 28.79262127
X136 2.70779777 20.45346529
X137 2.73824191 24.89121411
X138 2.76880598 29.23390805
X139 2.80077982 23.31624195
X140 2.83057237 21.14259882

Now, let's say that I want to determine the rate of increase at about x = 1.0, relative to the beginning of the period (i.e. 0 at the beginning in each case). We can see clearly here that the answer would be y = 43, y = 38 and y = 52 among the three cases, with 52 being the highest and 38 being the lowest (so, a range of 14). My question is would it be possible to quickly determine the value at around x = 1.0 across the 10 dataframes that I have like this without having to manually check them? The idea is to determine the range of values for y at around x = 1.0 across all dataframes. Note that it's not perfectly x = 1.0 in all dataframes - some could be something like 0.99 or 1.01.

Thank you, and I your feedback would be most appreciated!

Android17
  • 93
  • 1
  • 12

2 Answers2

1

You could rbind all the data frames and then subset and use range where X equals, then diff to calculate your desired value. For instance, in your example:

ll <- rbind(df1, df2, df3)
range(ll[ll$X == 1, "Y"])
# [1] 38 52

rr <- range(ll[ll$X == 1, "Y"], na.rm = TRUE)
diff(rr)
# [1] 14

You could expand this a little bit around the range of 1.0 by creating a min and max value for wiggle room:

minwiggle <- 0.99
maxwiggle <- 1.01

range(ll[ll$X <= maxwiggle & ll$X >= minwiggle, "Y"])
# [1] 38 52

rr <- range(ll[ll$X <= maxwiggle & ll$X >= minwiggle, "Y"], na.rm = TRUE)
diff(rr)
# [1] 14
jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Thank you so much! And does this work even if the dataframes have different numbers of rows? – Android17 May 10 '23 at 13:35
  • yep! as long as the column names are the same (if they aren't, there is a quick workaround). – jpsmith May 10 '23 at 13:50
  • Many thanks, once again! Yes, all of the column names are the same. One thing that I noticed, though, is that when I run the suggested function above, I receive this: "[1] Inf -Inf Warning messages: 1: In min(x, na.rm = na.rm) : no non-missing arguments to min; returning Inf 2: In max(x, na.rm = na.rm) : no non-missing arguments to max; returning -Inf " Is there a way around this? – Android17 May 10 '23 at 14:09
  • @Android17 I think so - see edit where I added add `na.rm = TRUE` to the `range()` function - it will ignore NA values. – jpsmith May 10 '23 at 14:18
  • Thanks, again! I added the "na.rm = TRUE", but the same warning message appears. Do you know what could be causing that to persist? Maybe because the first values in each dataframe are "0"? – Android17 May 10 '23 at 14:35
  • @Android17 gotcha - it seems like something in the data, so I may not be able to help without the exact data - if you update your question with `dput` for the three datasets, I can investigate – jpsmith May 10 '23 at 14:36
  • No problem! I just added a sample of what one of my actual dataframes looks like in my original post above (140 rows). All 10 dataframes that I have are structured this way and all begin with "0" for the x and y values at the first row. They all have the same column names. I am wondering if the "zeros" could be the issue? – Android17 May 10 '23 at 14:49
  • The error is happening because its not finding any `X` values that satisfy the condition - likely because of a [floating point issue](https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal) - meaning using `==` is a bad idea. In your real data I get the error when I use `ll$X == 1` in the `range` statement, but not with the "wiggle" approach in the second code. Does that approach still throw the error? – jpsmith May 10 '23 at 15:13
  • Yes, the error appears in that second code, as well, unfortunately. What if we re-define "minwiggle" and "maxwiggle" to satisfy the real data that I have above? – Android17 May 10 '23 at 15:30
1

one approach:

target_val = 1 ## X value to match (as close as possible)

list(df_A, df_B, df_C) |> ## list of dataframes
  Map(f = \(the_df) {  
    ## find row index where X is closest to the target_val
    row_index = which.min(abs(the_df$X - target_val))
    the_df$Y[row_index]
  }) |>
  Reduce(f = c) |> ## concatenate values
  fivenum() ## put aggregating function here

output for X == 1, using your example dataframes:

[1] 38.0 40.5 43.0 47.5 52.0
I_O
  • 4,983
  • 2
  • 2
  • 15