0

I have a file titled "femalet3" which has a column consisting of string values. All the other columns have a numerical value, which is why I use the code below:

femalet3$mean.f<-data.frame(mean.f=femalet3[,1], mean.f=rowMeans(femalet3[,-1]))

The idea for this came from: Calculate row means on subset of columns

The issue is that when I run this line I receive this output:

Significance GSM1311846 GSM1311847 mean.f.mean.f mean.f.mean.f.1
Vsig         88.35497   83.16820          VSig        85.40076

The issue is that I have several "mean.f" and the value for "Significance" is copied over into mean.f column. I did colnames(femalet3) and the output is:

"Significance" "GSM1311840" "GSM1311841" "GSM1311842" "GSM1311843"        "GSM1311844" "GSM1311845"  "GSM1311846"  "GSM1311847"  "mean.f"    

There is apparently only one "mean.f" despite the output earlier. I don't think I am using the line of code taken from the other Q&A correctly and it may be causing this error in formatting. The desired output is:

          Significance GSM1311846 GSM1311847 mean.f
          Vsig         88.35497   83.16820    85.40076
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
Gil Ong
  • 67
  • 1
  • 6
  • Welcome to the SO! We could help you better if you provided an example that produces the error (MVCE). If that is not feasible, is your file loaded into R as a `data.frame` or a `matrix`? Try `str(femalet3)` to check that each column is loaded as you would expect. – nya Nov 25 '19 at 15:31
  • Hello, thank you for the response. The file is a data.frame and after doing str(femalet3) is says that there are two variables in mean.f, which is not what I want : $ mean.f : Factor w/ 3 levels "Sig","VSig","VVSig": 2 1 2 1 1 1 1 1 1 1 ... $ mean.f.1: num 85.4 77.2 87.3 84.7 84.3 ... – Gil Ong Nov 25 '19 at 15:47
  • Hey Gil, ok you should not force a data.frame into a column of a data.frame.. First do, results = data.frame(mean.f=femalet3[,1], mean.f=rowMeans(femalet3[,-1])). head(results). Is this what you want? – StupidWolf Nov 25 '19 at 16:05
  • @StupidWolf I tried the command but it is still splitting mean.f into two variables, which are "mean.f" and "mean.f.1". My goal is to have only the mean value for mean.f . – Gil Ong Nov 25 '19 at 22:16
  • Sorry I made a typo, so it should be results=data.frame(var=femalet3[,1], mean.f=rowMeans(femalet3[,-1])) – StupidWolf Nov 25 '19 at 22:19
  • The first column called var, is the id of the row variable, the 2nd column gives u the rowMeans on subset of columns – StupidWolf Nov 25 '19 at 22:20
  • 1
    @StupidWolf it seems to be working, let me run the whole thing! – Gil Ong Nov 25 '19 at 22:22
  • @StupidWolf it's working properly now, I suppose it was because I was calling the old and new columns the same name? Thank you very much for your help! – Gil Ong Nov 25 '19 at 22:32
  • No not really, you were trying to force a data.frame into a column of a data frame. Ok below I write a solution and explain why you encountered a problem. – StupidWolf Nov 25 '19 at 22:36

1 Answers1

1

You encountered a problem from this line of code:

femalet3$mean.f<-data.frame(mean.f=femalet3[,1], mean.f=rowMeans(femalet3[,-1]))

femalet3 is a data.frame to start with. If you try to assign another data frame to a column, it gives you something with a weird structure.

I simulate your dataset below to show where the error occurs:

femalet3 <- data.frame(Significance = letters[1:10],matrix(rnorm(80),ncol=8))
colnames(femalet3)[-1] = c("GSM1311840","GSM1311841","GSM1311842",
"GSM1311843","GSM1311844","GSM1311845","GSM1311846","GSM1311847")
femalet3$mean.f<-data.frame(mean.f=femalet3[,1], mean.f=rowMeans(femalet3[,-1]))

head(femalet3)
  Significance  GSM1311840 GSM1311841  GSM1311842  GSM1311843 GSM1311844
1            a -0.09282641  0.0753268 -0.04400652  0.02442526  0.3065423
2            b  1.14718259  0.6062297 -0.08556210  0.15121682  1.6412273
3            c -1.45645947 -1.6808505 -1.93452662 -0.06121562  1.9080640
4            d  0.03955011  1.5496713 -0.27779819 -0.69083631  0.8331726
5            e -0.61881124  1.2798835 -0.55046474 -0.61394703  2.3530607
6            f  1.77918616  0.5156059  0.37311045  1.77081855 -0.8689152
  GSM1311845 GSM1311846 GSM1311847 mean.f.mean.f mean.f.mean.f.1
1  1.1210784  0.6891616  0.7314997             a       0.3514002
2  1.8341236  3.0722572  0.9026674             b       1.1586678
3 -0.5721591  2.8964295 -2.0082267             c      -0.3636181
4  1.1212192  0.2129126  0.9595494             d       0.4684301
5 -0.6253303  1.0512457 -1.2166623             e       0.1323718
6  0.4963209 -0.5864916  0.4429023             f       0.4903172

This embeds a data.frame within the mean.f column in your data.frame:

ncol(femalet3)
10

head(femalet3$mean.f)
   mean.f   mean.f.1
1       a  0.3514002
2       b  1.1586678
3       c -0.3636181
4       d  0.4684301
5       e  0.1323718

We remove the previous weird column:

femalet3$mean.f <- NULL

To avoid this, what you simply need is:

femalet3$mean.f<-rowMeans(femalet3[,-1])
head(femalet3)

> head(femalet3)
  Significance  GSM1311840 GSM1311841  GSM1311842  GSM1311843 GSM1311844
1            a -0.09282641  0.0753268 -0.04400652  0.02442526  0.3065423
2            b  1.14718259  0.6062297 -0.08556210  0.15121682  1.6412273
3            c -1.45645947 -1.6808505 -1.93452662 -0.06121562  1.9080640
4            d  0.03955011  1.5496713 -0.27779819 -0.69083631  0.8331726
5            e -0.61881124  1.2798835 -0.55046474 -0.61394703  2.3530607
6            f  1.77918616  0.5156059  0.37311045  1.77081855 -0.8689152
  GSM1311845 GSM1311846 GSM1311847     mean.f
1  1.1210784  0.6891616  0.7314997  0.3514002
2  1.8341236  3.0722572  0.9026674  1.1586678
3 -0.5721591  2.8964295 -2.0082267 -0.3636181
4  1.1212192  0.2129126  0.9595494  0.4684301
5 -0.6253303  1.0512457 -1.2166623  0.1323718
6  0.4963209 -0.5864916  0.4429023  0.4903172
StupidWolf
  • 45,075
  • 17
  • 40
  • 72