1

I have a dataset in R that I am trying to subset into a second data frame.

I'm not really sure it's relevant, but just in case, the data is something along the lines of this:

V1 V2 V3 V4 V5 V6
ab 10 98 0.9 0.1 abc
cd 11 99 0.8 0.05 cde

So I was trying to subset it by doing the following:

df_new = data.frame(data$V2, data$V5, data$V6)

This has actually worked in the past so I didn't think anything of using it here, but for some reason, the output of this was:

data.V2 data.V5 data.V6
10      0.1     abc
11      0.05    cde

So, for some reason the function was adding the name of the original data frame to the column names when I was subsetting it. I checked the documentation and couldn't see an option for preventing this (I just want to keep the original names). So I'm not really sure what exactly was going wrong here.

Sabor117
  • 111
  • 1
  • 11
  • 1
    just do something like this `df_new = data.frame(V2 = data$V2, V5 = data$V5,V6 = data$V6)` – Mike Oct 25 '18 at 16:30

1 Answers1

3

When you try to use, e.g., data$V2, that is something that doesn't have a name:

data$V2
# [1] 10 11

So, this kind of behaviour is expected. The best option would probably be

data[, c("V2", "V5", "V6")]
#   V2   V5  V6
# 1 10 0.10 abc
# 2 11 0.05 cde

or, if you want to stick with data.frame,

with(data, data.frame(V2, V5, V6))
#   V2   V5  V6
# 1 10 0.10 abc
# 2 11 0.05 cde

Something longer but with a possibility to assign any names would be

data.frame(A = data$V2, B = data$V5, C = data$V6)
#    A    B   C
# 1 10 0.10 abc
# 2 11 0.05 cde

or

with(data, data.frame(A = V2, B = V5, C = V6))
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • So this solved the issue I was having, but I actually wanted to go back to clarify something in relation to your initial point: The columns did actually have names, I was just using V1:V6 for ease of understanding. The columns in questions were named something like ```rsid```, ```ref``` and ```alt```. Does that change the initial point of your answer, or are those still technically "not names"? – Sabor117 Oct 25 '18 at 16:41
  • 1
    It doesn't change my answer as `V1`, `V2`, ... are valid column names already. What I meant is that the object `data$V2` doesn't have a name (as shown in the output), which makes sense. Why should it be `V2`? Actually it makes perfect sense to take `data.V2` because `data$V2` is not just `V2`; it is the `V2` column from the data frame `data`. On the other hand, if you try `data.frame(data[, "V2", drop = FALSE], data[, "V5", drop = FALSE])`, the result is different: `data[, "V2", drop = FALSE]` is a named *column* and hence gives the desired result. – Julius Vainora Oct 25 '18 at 16:49
  • 1
    @Sabor117 I thought maybe the confusion was more that when calling `data.frame()` if the arguments passed aren't explicitly named, i.e. `data.frame(x = 1:3,y = 1:3)`, R will attempt to use the object name itself as the column name, modified to fit its rules for what is syntactically valid. – joran Oct 25 '18 at 16:52
  • I think I see! I suppose the misunderstanding here stems from the fact that I thought ```data$V2``` meant that R would think you are calling the column called ```V2``` in ```data```. So I assumed that passing ```data$V2``` to somewhere else means that you were passing the "column" data. Okay, I think that's completely cleared this up. Thanks all! – Sabor117 Oct 25 '18 at 16:55
  • 1
    @Sabor117 This may also be relevant reading: [Extracting specific columns from a data frame](https://stackoverflow.com/questions/10085806/extracting-specific-columns-from-a-data-frame) – Henrik Oct 25 '18 at 17:07