0

Edit: Clarity

When I append a new column to a existing data.frame, the title of the columns are incorrect. In summary.myData, the last two columns "Measure" and "Measure" should say "plus" and "minus" respectively.

This is tied in with another question I had, where I ask about how to correctly reference a column in a Tk/R GUI I am working on.

Parent Question

myData:

   Group Subgroup  Measure
1      A        1 0.234213
2      A        1 0.046248
3      A        1 0.391376
4      A        2 0.911849
5      A        2 0.729955
6      A        2 0.991110
7      A        2 0.378422
8      A        3 0.898037
9      A        3 0.258884
10     A        3       NA
11     A        3 0.057631
12     A        3 0.745202
13     A        3 0.121376
14     B        1 0.385198
15     B        1 0.484399
16     B        1 0.115034
17     B        1 0.073629
18     B        1 0.456150
19     B        2 0.336108
20     B        2 0.845458
21     B        2 0.267494
22     B        3 0.536123
23     B        3 1.331731
24     B        3 0.505114
25     B        3 0.843348
26     B        3 0.827932
27     B        3 0.813351
28     C        1 0.095587
29     C        1 0.158822
30     C        1 0.392376
31     C        1 0.284625
32     C        2 0.898819
33     C        2 0.743428
34     C        2 0.298989
35     C        2 0.423961
36     C        3 0.868351
37     C        3 0.181547
38     C        3 1.146131
39     C        3 0.234941

Append script:

  summary.myData<-summarySE(myData, measurevar=paste(tx.choice1), groupvars=paste(tx.choice2),conf.interval=0.95,na.rm=TRUE,.drop=FALSE)
  summary.myData$plus<-summary.myData[3]-summary.myData[6]
  summary.myData$minus<-summary.myData[3]+summary.myData[6]

Result:

  Group  N   Measure        sd         se        ci   Measure   Measure
1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Bluebird
  • 531
  • 1
  • 5
  • 18

1 Answers1

1

The problem you're running into is that you've assigned $plus and $minus to data.frames, rather than atomic vectors. So when printing, R is showing the column name in the embedded data.frame ('Measure' in both cases), rather than the name of the list component ('plus' and 'minus').

str(summary.myData);
## 'data.frame': 3 obs. of  8 variables:
##  $ Group  : Factor w/ 3 levels "A","B","C": 1 2 3
##  $ N      : num  12 14 12
##  $ Measure: num  0.48 0.559 0.477
##  $ sd     : num  0.354 0.341 0.347
##  $ se     : num  0.1022 0.0912 0.1
##  $ ci     : num  0.225 0.197 0.22
##  $ plus   :'data.frame':  3 obs. of  1 variable:
##   ..$ Measure: num  0.255 0.362 0.257
##  $ minus  :'data.frame':  3 obs. of  1 variable:
##   ..$ Measure: num  0.705 0.756 0.697
summary.myData;
##   Group  N   Measure        sd         se        ci   Measure   Measure
## 1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862

Replace the assignments with

summary.myData$plus <- summary.myData[,3]-summary.myData[,6];
summary.myData$minus <- summary.myData[,3]+summary.myData[,6];

Then you get:

str(summary.myData);
## 'data.frame': 3 obs. of  8 variables:
##  $ Group  : Factor w/ 3 levels "A","B","C": 1 2 3
##  $ N      : num  12 14 12
##  $ Measure: num  0.48 0.559 0.477
##  $ sd     : num  0.354 0.341 0.347
##  $ se     : num  0.1022 0.0912 0.1
##  $ ci     : num  0.225 0.197 0.22
##  $ plus   : num  0.255 0.362 0.257
##  $ minus  : num  0.705 0.756 0.697
summary.myData;
##   Group  N   Measure        sd         se        ci      plus     minus
## 1     A 12 0.4803586 0.3539277 0.10217014 0.2248750 0.2554836 0.7052335
## 2     B 14 0.5586478 0.3412835 0.09121184 0.1970512 0.3615966 0.7556990
## 3     C 12 0.4772981 0.3465511 0.10004069 0.2201881 0.2571100 0.6974862

The key here is the different indexing style. When you use 1D indexing, you're actually treating the data.frame as a list (which it is internally), and so the index operation returns the specified list components, still classed as a data.frame. When you use 2D indexing, you index the rows and columns separately, which allows you to extract a 2D "subtable" of the data.frame. But when you only specify one column, the default behavior (drop=T) is for the column to be returned as an atomic vector, rather than as a one-column data.frame. You can change this with drop=F.

summary.myData[3];
##     Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
summary.myData[,3];
## [1] 0.4803586 0.5586478 0.4772981
summary.myData[,3,drop=F];
##     Measure
## 1 0.4803586
## 2 0.5586478
## 3 0.4772981
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • What is the "," in `summary.myData[,3]`? – Bluebird Jul 28 '15 at 19:39
  • Read the revised answer, I must admit I followed you until I hit the 1D, 2D and 3D indexing part then I lost you.I will certainly come back in a little while (approx 6 months), I bet I will understand then. – Bluebird Jul 28 '15 at 20:02
  • Note that the `drop=F` argument in my last example is *not* a dimensional subscript, rather, it is just a third argument to the `\`[\`` function. It is only possible to 1D- or 2D-index a data.frame. – bgoldst Jul 28 '15 at 20:11
  • Everything in R is a function, including bracket-indexing, dollar-dereferencing, and even braced blocks and parenthesization. See my answer at http://stackoverflow.com/questions/30562009/what-is-class-in-r/30563201#30563201 for more advanced discussion of the R language, with a focus on parse trees and S3 classes (although it may be TMI for the purposes of this question). – bgoldst Jul 28 '15 at 20:11