0

I want to display all outliers by First column and respective column names. I am using Boxplot from "car" package, if there is any other efficient solution with boxplot (lower case) then also let me know.

    AFD2[Boxplot(AFD2$GOL), c("Catkey", "GOL")]
     Catkey GOL
58  A2SC043 152
216 KU-1265 153
510   TU-49 199

I wish to write a loop which will display all outliers like above.

     Catkey GOL
58  A2SC043 152
216 KU-1265 153
510   TU-49 199
     Catkey NOL        
25  GF-5466 50
517 yU-1869 452
378   KU-11 765
likewise.....

I have total 48 columns first column is "Catkey" and rest of he columns are readings.. GOL, ABC, EFG, PIL, GHF, etc.

Please help.

Here is how my dataframe looks like

> head(AFD2)
  Record  Catkey Sex GOL NOL BNL BBH XCB XFB WFB ZYB AUB ASB BPL NPH
2      2 019-CRA   M 161 160  95 135 143 116  90 135 128 109  89  72
3      3 021-CRA   M 174 169 109 142 139 112  87 141 131 101  95  66
4      4 023-CRA   M 171 168 100 140 136 112  89 135 126 110  99  72
5      5 024-CRA   F 166 167  94 130 133 100  85 124 121 104  94  63
6      6 025-CRA   M 166 168 100 140 148 120  92 139 130 109  93  73
7      7 026-CRA   M 165 165  98 135 146 118  89 136 129 108  93  68
  NLH JUB NLB MAB MAL MDH OBH OBB DKB NDS  WNB SIS ZMB SSS FMB NAS
2  52 117  29  62  48  28  36  40  20  10 10.1 4.7  99  23  95  15
3  54 121  29  61  46  30  38  43  16   7  6.2 3.5  96  19  97  13
4  54 118  26  68  52  28  34  40  18   9  6.9 4.1 100  24  95  12
5  46 108  23  60  51  25  31  37  23   8  9.0 2.5  92  23  91  14
6  53 119  26  69  54  26  36  40  22  12  9.7 4.0  97  20  98  14
7  52 120  30  68  51  30  35  38  22   7  8.8 2.3  98  23  92  14
  EKB DKS IML XML MLS WMH GLS STB FRC FRS FRF PAC PAS PAF OCC OCS
2  98  10  37  55  12  25   2 116 108  24  54  98  24  56  92  28
3  98  13  40  59  14  20   4 101 106  25  53 112  27  55  88  21
4  94   8  29  51  13  25   4 113 111  26  56 114  25  62  94  23
5  93   9  33  51  11  20   2  93 107  25  49 106  23  60  97  21
6 100   6  39  56  14  25   5 112 117  20  58 101  25  48  95  20
7  96   9  32  49   9  23   2 111 113  26  55  97  23  48  94  26
  OCF
2  58
3  42
4  58
5  39
6  46
7  64

> str(AFD2)
'data.frame':   526 obs. of  48 variables:
 $ Record: int  2 3 4 5 6 7 8 9 10 11 ...
 $ Catkey: Factor w/ 589 levels "016-CRA","019-CRA",..: 2 3 4 5 6 7 8 9     10 11 ...
 $ Sex   : Factor w/ 6 levels "","F","M","MALE?",..: 3 3 3 2 3 3 3 5 2 3     ...
 $ GOL   : int  161 174 171 166 166 165 171 157 166 183 ...
 $ NOL   : int  160 169 168 167 168 165 169 158 164 179 ...
 $ BNL   : int  95 109 100 94 100 98 99 85 94 99 ...
 $ BBH   : int  135 142 140 130 140 135 138 123 125 139 ...
 $ XCB   : int  143 139 136 133 148 146 134 127 132 141 ...
 $ XFB   : int  116 112 112 100 120 118 109 105 107 118 ...
 $ WFB   : int  90 87 89 85 92 89 93 81 85 95 ...
 $ ZYB   : int  135 141 135 124 139 136 131 104 120 137 ...
 $ AUB   : int  128 131 126 121 130 129 120 103 116 127 ...
 $ ASB   : int  109 101 110 104 109 108 105 96 101 107 ...
 $ BPL   : int  89 95 99 94 93 93 93 75 94 98 ...
 $ NPH   : int  72 66 72 63 73 68 62 54 64 68 ...
 $ NLH   : int  52 54 54 46 53 52 51 42 48 49 ...
 $ JUB   : int  117 121 118 108 119 120 116 91 104 123 ...
 $ NLB   : int  29 29 26 23 26 30 28 21 24 28 ...
 $ MAB   : int  62 61 68 60 69 68 66 48 60 69 ...
 $ MAL   : int  48 46 52 51 54 51 49 37 48 53 ...
 $ MDH   : int  28 30 28 25 26 30 32 15 25 31 ...
 $ OBH   : int  36 38 34 31 36 35 35 32 33 32 ...
 $ OBB   : int  40 43 40 37 40 38 38 34 36 37 ...
 $ DKB   : int  20 16 18 23 22 22 25 15 19 23 ...
 $ NDS   : int  10 7 9 8 12 7 9 6 7 10 ...
 $ WNB   : num  10.1 6.2 6.9 9 9.7 8.8 9.6 5.8 6.9 6.8 ...
 $ SIS   : num  4.7 3.5 4.1 2.5 4 2.3 3 1.7 1.7 1.9 ...
 $ ZMB   : int  99 96 100 92 97 98 97 71 92 98 ...
 $ SSS   : int  23 19 24 23 20 23 23 19 21 23 ...
 $ FMB   : int  95 97 95 91 98 92 95 79 90 99 ...
 $ NAS   : int  15 13 12 14 14 14 17 13 11 14 ...
 $ EKB   : int  98 98 94 93 100 96 98 80 91 98 ...
 $ DKS   : int  10 13 8 9 6 9 10 11 6 7 ...
 $ IML   : int  37 40 29 33 39 32 37 30 31 36 ...
 $ XML   : int  55 59 51 51 56 49 55 48 51 56 ...
 $ MLS   : int  12 14 13 11 14 9 13 9 14 14 ...
 $ WMH   : int  25 20 25 20 25 23 22 19 22 27 ...
 $ GLS   : int  2 4 4 2 5 2 2 1 2 4 ...
 $ STB   : int  116 101 113 93 112 111 108 105 107 111 ...
 $ FRC   : int  108 106 111 107 117 113 109 99 100 116 ...
 $ FRS   : int  24 25 26 25 20 26 28 22 25 28 ...
 $ FRF   : int  54 53 56 49 58 55 47 46 51 49 ...
 $ PAC   : int  98 112 114 106 101 97 115 101 105 104 ...
 $ PAS   : int  24 27 25 23 25 23 26 24 23 21 ...
 $ PAF   : int  56 55 62 60 48 48 57 52 52 47 ...
 $ OCC   : int  92 88 94 97 95 94 90 92 96 112 ...
 $ OCS   : int  28 21 23 21 20 26 27 20 28 40 ...
 $ OCF   : int  58 42 58 39 46 64 50 49 51 71 ...
 - attr(*, "na.action")=Class 'omit'  Named int [1:63] 1 12 18 20 23 24 29 33 35 39 ...
  .. ..- attr(*, "names")= chr [1:63] "1" "12" "18" "20" ...
Osro_db40
  • 23
  • 9
  • If it's `Boxplot` with uppercase `B`, where does it come from, what's the package? – Rui Barradas Oct 23 '17 at 13:44
  • and add some reproducible data, e.g. `dput(AFD2)` – Roman Oct 23 '17 at 13:47
  • `lapply(names(AFD2)[-1], function(x) AFD2[Boxplot(AFD2[[x]]), c("Catkey", x)])` – Gregor Thomas Oct 23 '17 at 14:38
  • Can anyone find a good duplicate for "how to write a loop in R"? – Gregor Thomas Oct 23 '17 at 14:40
  • I am using "car" package for Boxplot (upper case). @Gregor the solution you provided throws an error: Error in oldClass(stats) <- cl : adding class "factor" to an invalid object Called from: boxplot.default(y, ylab = ylab, ...) – Osro_db40 Oct 23 '17 at 14:44
  • It works perfectly on the data you shared... maybe you could [share some data reproducibly (see a guide here)](https://stackoverflow.com/q/5963269/903061) and then I can try to debug. I'm also skeptical... I used `Boxplot` with a capital `B`, as in your question. But the error you report uses `boxplot` with a lower case `b`. Did you run the command right? – Gregor Thomas Oct 23 '17 at 14:47
  • Yes, still error. I will edit my question with more information. – Osro_db40 Oct 23 '17 at 14:52
  • Rui Barradas, Jimbou, Gregor: I have added more information now. – Osro_db40 Oct 23 '17 at 15:02
  • Okay, the issue is that you said *"first column is "Catkey" and rest of he columns are readings.. GOL, ABC,..."*, so I used `[-1]` to omit the first column. But really the first column is `Record`, the second is `Catkey`, and the third, `Sex`, also needs to be skipped. Use the same code I posted above but replace `-1` with `-(1:3)`. – Gregor Thomas Oct 23 '17 at 18:56
  • That's because I had removed first and third cols previously, and it didn't work. Now it does without having to remove them. Thank you so much. – Osro_db40 Oct 23 '17 at 19:07

0 Answers0