0

I'm trying to get from R the "significance letters" (i.e. the letters to show statistical differences like a, b, c, d etc.) after performing a Dunn.test.

Looking around, I've found some ways but always involving pipes and advanced commands which I still don't understand. Is there any way to get the significance letters in a more simple way? Here is my data.

This is my second question on this site. In the first one, they suggested me to use the dput() command to show the data, so here it is.

dput(data)
list(chi2 = 91.5474627943893, Z = c(-0.0616007222323512, -1.92269791323906, 
-2.15853959332545, -1.97713443399894, -2.22139747315438, -0.0544365207598832, 
-0.84485480219338, -0.913953572712641, 1.07784311104568, 1.13227963180556, 
-4.22784174426838, -5.37100364515101, -1.79579988095308, -1.72694252355457, 
-3.15917555744354, -3.35993879738757, -4.24276565553434, -0.934815986110472, 
-0.866154524862535, -2.29431291881961, 1.34752695561308, -0.992922138660262, 
-1.08492700584733, 0.929775774578798, 0.984212295338681, -0.148067336466881, 
2.9718835453196, 2.10755374422522, -2.64125998726951, -2.98826360706733, 
-0.718562074030453, -0.66412555327057, -1.79640518507613, 0.88688276329278, 
0.0284846976377152, -1.64833784860925, -5.2955847395214, -6.05321382752595, 
-3.37288682628234, -3.31845030552245, -4.45072993732802, -2.47060198345846, 
-3.31944815281165, -4.30266260086113, -2.65432475225188, -1.75517964537586, 
-1.94794065368998, 0.110111315198094, 0.162922497547301, -0.935550095316183, 
1.84911433052131, 1.02807451068034, -0.791903679326343, 0.807218922207613, 
3.38229217355489, 0.955905304543542, 1.16538509202836, 2.8786032177826, 
2.93303973854248, 1.80076010673692, 5.43697694018618, 4.56563405690133, 
1.9488274432038, 3.59716529181305, 6.25149004406494, 2.68254400742792, 
-2.39302945260445, -2.70163167504741, -0.470331539365387, -0.415895018605504, 
-1.54817465041107, 1.20087231302998, 0.341580960928304, -1.40010731394418, 
0.248230534665066, 2.90255528691695, -0.566399930695234, -3.34893475714799, 
-0.344038811202459, -0.335661078286486, 1.5786591020366, 1.63309562279648, 
0.500815990990921, 3.79266324550981, 2.92599836230062, 0.648883327457803, 
2.29722117606705, 4.95154592831894, 1.42141297292888, -1.299944115746, 
2.04899064140199), P = c(0.475440400889232, 0.0272589995374294, 
0.0154429517664608, 0.0240132182191774, 0.0131620260164579, 0.478293691309955, 
0.1990959562042, 0.180370620850913, 0.140551888366266, 0.128758445243508, 
1.17971822695677e-05, 3.91498131897193e-08, 0.0362631748446777, 
0.0420889932117001, 0.000791080762535689, 0.000389798704002552, 
1.10390975115185e-05, 0.174941569815155, 0.193202713817808, 0.0108862655219185, 
0.0889052889618982, 0.160373950014568, 0.138976992948999, 0.176243595897147, 
0.162505570807551, 0.441144813492228, 0.00147989463301821, 0.0175348041176823, 
0.00412991540147731, 0.001402837245659, 0.236205394176096, 0.253304978252812, 
0.0362150499135892, 0.187571011298663, 0.488637786297009, 0.049641681044054, 
5.93180809475948e-08, 7.09920980276445e-10, 0.00037192259942881, 
0.00045259213790549, 4.27894538720707e-06, 0.00674429261165095, 
0.000450977759269412, 8.43788687309664e-06, 0.00397336528564616, 
0.0396142921990106, 0.0257110323200797, 0.456160547248404, 0.435289728506084, 
0.174752439627153, 0.0322206527643205, 0.151957389805405, 0.214208420728234, 
0.209770182936397, 0.000359418230726314, 0.169560039392602, 0.121931574524466, 
0.00199720271745873, 0.00167830463242721, 0.0358703496057556, 
2.70960742838858e-08, 2.48993416065843e-06, 0.025658017631868, 
0.000160852031977026, 2.03277511233098e-10, 0.00365322690468119, 
0.00835494983206326, 0.00345000763113789, 0.319059083754041, 
0.338743412399024, 0.0607901264427539, 0.114900367843445, 0.366333133634563, 
0.0807405925958639, 0.401978021143645, 0.00185065910684105, 0.285560970930603, 
0.000405614480033071, 0.36540854926137, 0.368563229784805, 0.0572071352056327, 
0.0512244435342478, 0.308250315219942, 7.45201049742319e-05, 
0.00171676372384843, 0.258206896016643, 0.0108030778551853, 3.68131189612568e-07, 
0.0775983686203551, 0.0968100617394144, 0.0202315150282379), 
    P.adjusted = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.00107354358653066, 
    3.56263300026445e-06, 1, 1, 0.0719883493907477, 0.0354716820642322, 
    0.00100455787354818, 1, 1, 0.990650162494585, 1, 1, 1, 1, 
    1, 1, 0.134670411604657, 1, 0.375822301534435, 0.127658189354969, 
    1, 1, 1, 1, 1, 1, 5.39794536623113e-06, 6.46028092051565e-08, 
    0.0338449565480217, 0.0411858845493996, 0.000389384030235843, 
    0.613730627660236, 0.0410389760935165, 0.000767847705451794, 
    0.361576240993801, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.0327070589960946, 
    1, 1, 0.181745447288745, 0.152725721550876, 1, 2.46574275983361e-06, 
    0.000226584008619917, 1, 0.0146375349099094, 1.84982535222119e-08, 
    0.332443648325988, 0.760300434717757, 0.313950694433548, 
    1, 1, 1, 1, 1, 1, 1, 0.168409978722536, 1, 0.0369109176830094, 
    1, 1, 1, 1, 1, 0.0067813295526551, 0.156225498870208, 1, 
    0.983080084821862, 3.34999382547436e-05, 1, 1, 1), comparisons = c("CEM3 - CEM4", 
    "CEM3 - GAB2", "CEM4 - GAB2", "CEM3 - MBE2", "CEM4 - MBE2", 
    "GAB2 - MBE2", "CEM3 - MRT1", "CEM4 - MRT1", "GAB2 - MRT1", 
    "MBE2 - MRT1", "CEM3 - OLT1", "CEM4 - OLT1", "GAB2 - OLT1", 
    "MBE2 - OLT1", "MRT1 - OLT1", "CEM3 - OLT3", "CEM4 - OLT3", 
    "GAB2 - OLT3", "MBE2 - OLT3", "MRT1 - OLT3", "OLT1 - OLT3", 
    "CEM3 - PLO1", "CEM4 - PLO1", "GAB2 - PLO1", "MBE2 - PLO1", 
    "MRT1 - PLO1", "OLT1 - PLO1", "OLT3 - PLO1", "CEM3 - PRA1", 
    "CEM4 - PRA1", "GAB2 - PRA1", "MBE2 - PRA1", "MRT1 - PRA1", 
    "OLT1 - PRA1", "OLT3 - PRA1", "PLO1 - PRA1", "CEM3 - PRA2", 
    "CEM4 - PRA2", "GAB2 - PRA2", "MBE2 - PRA2", "MRT1 - PRA2", 
    "OLT1 - PRA2", "OLT3 - PRA2", "PLO1 - PRA2", "PRA1 - PRA2", 
    "CEM3 - RAV1", "CEM4 - RAV1", "GAB2 - RAV1", "MBE2 - RAV1", 
    "MRT1 - RAV1", "OLT1 - RAV1", "OLT3 - RAV1", "PLO1 - RAV1", 
    "PRA1 - RAV1", "PRA2 - RAV1", "CEM3 - VIL3", "CEM4 - VIL3", 
    "GAB2 - VIL3", "MBE2 - VIL3", "MRT1 - VIL3", "OLT1 - VIL3", 
    "OLT3 - VIL3", "PLO1 - VIL3", "PRA1 - VIL3", "PRA2 - VIL3", 
    "RAV1 - VIL3", "CEM3 - VSO1", "CEM4 - VSO1", "GAB2 - VSO1", 
    "MBE2 - VSO1", "MRT1 - VSO1", "OLT1 - VSO1", "OLT3 - VSO1", 
    "PLO1 - VSO1", "PRA1 - VSO1", "PRA2 - VSO1", "RAV1 - VSO1", 
    "VIL3 - VSO1", "CEM3 - VSO3", "CEM4 - VSO3", "GAB2 - VSO3", 
    "MBE2 - VSO3", "MRT1 - VSO3", "OLT1 - VSO3", "OLT3 - VSO3", 
    "PLO1 - VSO3", "PRA1 - VSO3", "PRA2 - VSO3", "RAV1 - VSO3", 
    "VIL3 - VSO3", "VSO1 - VSO3"))

And (if it may help) this is the input-and output I use for Dunn.test

data<-dunn.test(test_02$dsDNA, test_02$UTS, method = "bonferroni")
  Kruskal-Wallis rank sum test

data: x and group
Kruskal-Wallis chi-squared = 91.5475, df = 13, p-value = 0


                           Comparison of x by group                            
                                 (Bonferroni)                                  
Col Mean-|
Row Mean |       CEM3       CEM4       GAB2       MBE2       MRT1       OLT1
---------+------------------------------------------------------------------
    CEM4 |  -0.061600
         |     1.0000
         |
    GAB2 |  -1.922697  -2.158539
         |     1.0000     1.0000
         |
    MBE2 |  -1.977134  -2.221397  -0.054436
         |     1.0000     1.0000     1.0000
         |
    MRT1 |  -0.844854  -0.913953   1.077843   1.132279
         |     1.0000     1.0000     1.0000     1.0000
         |
    OLT1 |  -4.227841  -5.371003  -1.795799  -1.726942  -3.159175
         |    0.0011*    0.0000*     1.0000     1.0000     0.0720
         |
    OLT3 |  -3.359938  -4.242765  -0.934815  -0.866154  -2.294312   1.347526
         |     0.0355    0.0010*     1.0000     1.0000     0.9907     1.0000
         |
    PLO1 |  -0.992922  -1.084927   0.929775   0.984212  -0.148067   2.971883
         |     1.0000     1.0000     1.0000     1.0000     1.0000     0.1347
         |
    PRA1 |  -2.641259  -2.988263  -0.718562  -0.664125  -1.796405   0.886882
         |     0.3758     0.1277     1.0000     1.0000     1.0000     1.0000
         |
    PRA2 |  -5.295584  -6.053213  -3.372886  -3.318450  -4.450729  -2.470601
         |    0.0000*    0.0000*     0.0338     0.0412    0.0004*     0.6137
         |
    RAV1 |  -1.755179  -1.947940   0.110111   0.162922  -0.935550   1.849114
         |     1.0000     1.0000     1.0000     1.0000     1.0000     1.0000
         |
    VIL3 |   0.955905   1.165385   2.878603   2.933039   1.800760   5.436976
         |     1.0000     1.0000     0.1817     0.1527     1.0000    0.0000*
         |
    VSO1 |  -2.393029  -2.701631  -0.470331  -0.415895  -1.548174   1.200872
         |     0.7603     0.3140     1.0000     1.0000     1.0000     1.0000
         |
    VSO3 |  -0.344038  -0.335661   1.578659   1.633095   0.500815   3.792663
         |     1.0000     1.0000     1.0000     1.0000     1.0000    0.0068*
Col Mean-|
Row Mean |       OLT3       PLO1       PRA1       PRA2       RAV1       VIL3
---------+------------------------------------------------------------------
    PLO1 |   2.107553
         |     1.0000
         |
    PRA1 |   0.028484  -1.648337
         |     1.0000     1.0000
         |
    PRA2 |  -3.319448  -4.302662  -2.654324
         |     0.0410    0.0008*     0.3616
         |
    RAV1 |   1.028074  -0.791903   0.807218   3.382292
         |     1.0000     1.0000     1.0000     0.0327
         |
    VIL3 |   4.565634   1.948827   3.597165   6.251490   2.682544
         |    0.0002*     1.0000    0.0146*    0.0000*     0.3324
         |
    VSO1 |   0.341580  -1.400107   0.248230   2.902555  -0.566399  -3.348934
         |     1.0000     1.0000     1.0000     0.1684     1.0000     0.0369
         |
    VSO3 |   2.925998   0.648883   2.297221   4.951545   1.421412  -1.299944
         |     0.1562     1.0000     0.9831    0.0000*     1.0000     1.0000
Col Mean-|
Row Mean |       VSO1
---------+-----------
    VSO3 |   2.048990
         |     1.0000

alpha = 0.05
Reject Ho if p <= alpha/2

Anyone knows what to do?

I tried looking for an answer and found that often they suggest using cldList() but always with pipes and commands I'm still not familiar to, so I ask to the experts: is there a (more) simple way to get the significance letters (and add them to the boxplot in ggplot?) out of the dunn.test without learning to use pipes today?

Mars_87
  • 1
  • 2
  • What exactly are you trying to extract when you say "significance digits", is it the p-value or some other component of the Dunn test result? – r2evans Aug 21 '23 at 13:59
  • Hi, thanks for the reply. I'm trying to extract the significane letters (a, b, c, d or ab, abc etc) – Mars_87 Aug 21 '23 at 14:22
  • I'm inferring that since there are no `"a"`, `"b"`, etc in your console dump or `data` that you really mean `"CEM3 - CEM4"`, etc. Try `data$comparisons`. If you look at `str(data)`, you'll see the `str`ucture of it, hinting at ways to get at certain components. If I'm not correct, please explain what component(s) of the output above (some literal examples you are looking for, since I don't see `"abc"` anywhere). – r2evans Aug 21 '23 at 14:27
  • You're perfectly right, I didnt' explain what I'm looking for correctly. What I have to do afrter running a dunn.test is to run a ggplot with boxplot and add letters to indicate statistical difference between e.g. CEM3 and OLT1 and so on. When there are only 3 or four columns is ok, but when there are 9 or more it becomes challenging and I was looking for a way to automaticly do it. – Mars_87 Aug 21 '23 at 14:49
  • 1
    Edit: I've seen only now that the original title of my question changed from "significance letters" to "significance digits" (?) sorry @r2evans for the strange names you read but I can assure i call them letters not digits, don't know what happened. Anyway thanks for your reply – Mars_87 Aug 21 '23 at 14:59
  • @Mr.Polywhirl, your edit has fundamentally changed this question. Please roll it back. – r2evans Aug 21 '23 at 15:01
  • Mars_87, I have no confidence that Mr.Polywhirl will be notified of that comment nor of the issue you take with their edit. I suggest you [edit] your question to undo the "digits" changes. – r2evans Aug 21 '23 at 15:03
  • Regardless ... it sounds like you know that you want to add (e.g.) `"CEM3 - CEM4"` to somewhere on a plot, is that right? I don't know how you're plotting the results nor how you intend it to look. – r2evans Aug 21 '23 at 15:04
  • @r2evans thanks a lot! I thought I was going mad. Now I see that the words "letters" have been changed in the question text as well, so I got what happened. Thanks once more – Mars_87 Aug 21 '23 at 15:08
  • It's ok in any case! I'm happy if I can learn a command to get the significance letters to highlight statistical differences and then manually put them on the plot, or if there is a "professional" way to add them directly with R on the plot created with ggplot() + geom_boxpot() which is the only line I've learned to use to draw boxplots until now – Mars_87 Aug 21 '23 at 15:14
  • @r2evans I was under the assumption that "significance letters" was a translation error. I have only ever heard of the term "significant digits" in the context of science and mathematics. – Mr. Polywhirl Aug 21 '23 at 15:18
  • I have seen questions like that before, so I don't think it's off-the-wall to think that, but OP has stated "digits" is not what they intended. Thanks! – r2evans Aug 21 '23 at 15:30
  • Mars_87, it's still not clear what exactly you intend to add to whatever ggplot image you're trying to make. We have sample test results which is good, and now you're using `data` in some way to make boxplots with ggplot2? How? If you're using the raw data for the boxplots, then I suggest we need a bit more context, including ... code you are using to try to generate boxplots, and what labels you want where on that plot. Thanks! – r2evans Aug 21 '23 at 15:32
  • @r2evans you're extremely patient and nice:-) thanks a lot! It's my second question on the site and I'm still extremely ill mannered and don't know to behave properly providing the correct informations for who's kindly reading. Sorry for that. You're perfectly right: "data" is just the name I gave to the command "data<-dunn.test(test_02$dsDNA, test_02$UTS, method = "bonferroni")" to show it here on the site, but for the boxplot I used I guess the simplest way, with "> ggplot(test_02, aes(group=UTS, x=UTS, y=dsDNA, fill=UTS)) + geom_boxplot(show.legend=FALSE)" – Mars_87 Aug 21 '23 at 15:56
  • where test_02 is the .csv I imported in R, UTS is a factorial column and dsDNA is the real data varying from UTS type -PLO1, PRA1, etc- to another. I'm asking if there is a way if there is a pipe-free way to obtain from R the letters toshow statistical differences after a dunn.test (and I'd be super happy just with that alone) or if exists a way to add with ggplot the letters to highlight statistical differences on the single boxplots (but this I start to be sure requires in any case firstly to have them and it's anyway probably too advanced for my beginner level) – Mars_87 Aug 21 '23 at 15:57
  • Okay, from your ggplot call, I'm going to guess that `UTS` is string (`"CEM3"`, ...) and `dsDNA` is numeric. This means you have as many boxes in your plot as you have unique `UTS` values. If you have 13 or so boxplots, but from your Dunn test I see 91 (`14*13/2`) comparisons being made. How do you envision those 91 numbers or comparisons being shown on a boxplot? This question is still under-defined, and I don't get the feeling that we know what the end result of the question and/or your plot should be. How many rows are in `test_02`? Can you share that data too? – r2evans Aug 21 '23 at 17:14
  • Related (and not yet resolved): https://stackoverflow.com/q/76948847/3358272 – r2evans Aug 21 '23 at 21:39
  • Yes, CEM3 and so on are the UTS while dsDNA is numerical. And yes, the other post you linked is very similar to what I'm looking for, that guy is just a step ahead of me I guess. I'll try to explain my primary question as simple as I can: after performing a dunn.test I'm trying to get the letters that show statistical differences, so I can manually add them (even with another software, no problem) on the boxplot graph. How can I do that? The ggplot issue is just a secondary question, and it's probably too difficult to explain so let's forget about it and stick to the primary question. – Mars_87 Aug 22 '23 at 10:14
  • to answer your question @r2evans , I'm not trying to show the 91 numbers of comparisons on the boxplot (I've probably made a mistake in explaining myself) im just trying to show on the boxplot the letters of the statistical differences that the dunn.test indicates. The end result I'm trying to get is just a boxplot graph with the letters on each boxplot indicating if its median is different or not form the others (just like in almost every normal publication I've seen) – Mars_87 Aug 22 '23 at 10:18
  • There are still too many areas of "unclear" here. You say you want to add the letters to a boxplot, but if you make a boxplot with ggplot2, the categories are already included on the axis; from `?geom_boxplot`, the example `ggplot(mpg, aes(class, hwy)) + geom_boxplot()` has the car classes on the x-axis, so I don't understand what would need to be added to your data. As I said earlier, `data$comparisons` gives you strings of comparisons, and you can use `strsplit`/`trimws` to get each pair, but while you still have 13 numbers for each category, you're asking for one each. *Which one?* – r2evans Aug 22 '23 at 11:25
  • It also sounds like you don't have a complete idea of what the plot would look like, other than the notion of boxplots. I often peruse https://r-graph-gallery.com/ for good ideas on how to present things. Perhaps what you're looking for is an upper-triangle (table) combined with the boxplot, perhaps with the axes aligned with columns. This might be done using a "text grob" combined with boxplot using `patchwork`. – r2evans Aug 22 '23 at 11:26
  • Actually I do have a complete idea of what the plot should look like, since I've published several articles. This time I simply have many boxplots together. Since I really can't think of an easier way to explain what I'm looking for, I'll paste a link with a picture in the next comment. See in the top-left image the letters A-AB-B marking statistical differences? The letters are what I'm after, and there is a way to obtain them with R, probably by using cld or some other commands as in the link you suggested above – Mars_87 Aug 22 '23 at 12:05
  • https://www.researchgate.net/publication/321316513/figure/fig3/AS:565689544962048@1511882305760/Boxplots-in-a-logarithmic-scale-indicating-significant-differences-among-means-comparing.png – Mars_87 Aug 22 '23 at 12:05
  • If you start with *comparisons* and *P.adjusted* in the environment, you can just call `library(rcompanion); cldList(comparison=comparisons, p.value=P.adjusted)`. No need to use pipes or anything. One complication, though, is that with that many treatments, the assigned letters may not be in a logical order. That is, the function doesn't know which treatment is the highest or lowest. It just addresses them as they come. So it's useful to reorder treatments before running the post-hoc, or carefully re-ordering the results. – Sal Mangiafico Aug 23 '23 at 14:53
  • Also note the decision rule in the *dunn.test()* funtion: `Reject Ho if p <= alpha/2`. There is an option in the function to change the decision rule to the more common `Reject Ho if p <= alpha`. Or you can use *FSA::dunnTest()*. – Sal Mangiafico Aug 23 '23 at 14:55
  • @Sal Mangiafico, Sal, you really hit the bull's eye. That is the answer, so thanks a lot! Your second insight is also very important to me. I was thinking about that just in the last days: if I remember correctly, for N comparisons the Bonferroni correction would set alpha = alpha/N, and so when I read alpha/2 at the bottom of the output, I always wonder if it should be different. I'm new to the dunn.test, I know it's a Kruskal Wallis iterated N times, but I don't know how the command works in R because the only reference to that I could find is in the ?dunn.test bonferroni output – Mars_87 Aug 24 '23 at 16:25
  • "bonferroni" the FWER is controlled using Dunn's (1961) Bonferroni adjustment, and adjusted p-values = max(1, pm). Those comparisons rejected with the Bonferroni adjustment at the α level (two-sided test) are starred in the output table, and starred in the list when using the list=TRUE option. @Sal, do you know anything if the alpha/N correction is applied in the dunn.test with bonferroni correction? – Mars_87 Aug 24 '23 at 16:26
  • For the `dunn.test` function it uses the `alpha/2` simply because that's the way the original paper was written. You can use the *altp=TRUE* option to output the more common two-sided *p*-value. – Sal Mangiafico Aug 24 '23 at 18:42
  • By default, the `dunn.test` function makes no adjustment for multiple tests. The `method=` option can be used to adjust *p*-values with Bonferroni, Šidák, Benjamini-Hochberg, and so on. – Sal Mangiafico Aug 24 '23 at 18:45
  • 1
    @SalMangiafico, Sal, thanks a lot. I've really appreciated your help. You've been very nice and showed knowledge even in not-easy and detailed questions. I'm italian and from you're name I guess you're not so distant or at least your relatives have not been so. So your help has an extra flavour. Thanks again, wish you all the best – Mars_87 Aug 25 '23 at 06:55

0 Answers0