Grouping with numeric variables

Question

I hava a dataframe like this:

name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google

I would like to convert the values of rows of the second column to columns and keep the first one and in other have a numeric value to 0 and 1 if not exist or exist the value. Here an example of the expected output:

name,Google,Yahoo
stockA,1,1
stockB,0,0
stockC,1,0

I tried this:

library(reshape2)
df2 <- dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)

and the error it gives me is this:

Using value as value column: use value.var to override.
Error in `[.data.frame`(x, i) : undefined columns selected

Any idea for the error?

An example in which the previous code works. Data (df):

name,nam2,value
stockA,sth1,Yahoo
stockA,sth2,NA
stockB,sth3,Google

and this works:

df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name + nam2 ~ value, length)

Possible duplicate of [Reshape three column data frame to matrix ("long" to "wide" format)](http://stackoverflow.com/questions/9617348/reshape-three-column-data-frame-to-matrix-long-to-wide-format) `df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"), value = c("Google", "Yahoo", NA, "Google")); xtabs(~name+value, data=df)` — jogo, Mar 21 '17 at 20:53
@jogo it works but the first column has an NA with zero and one and the column name doesn't exist — Keri, Mar 21 '17 at 20:58
Please use `dput()` to present your data or give a definition of your dataframe, i.e. edit your question: http://stackoverflow.com/posts/42937389/edit With the dataframe the I used in my comment I got the desired result. — jogo, Mar 21 '17 at 21:03

score 1 · Answer 1 · answered Mar 21 '17 at 20:33

You can do that with spread from the tidyr package.

df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"),
                 value = c("Google", "Yahoo", NA, "Google"))
df$row <- 1
df %>% 
  spread(value, row, fill = 0) %>% 
  select(-`<NA>`)

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

The OP has asked to get an explanation for the error caused by

dcast(melt(df, 1:2, na.rm = TRUE), df + name ~ value, length)

(I'm quite astonished that no one so far has tried to improve the OP's reshape2 approach to return exactly the expected answer).

There are several issues with OP's code:

df appears in the dcast() formula.
The second parameter to melt() is 1:2 which means that all columns are used as id.vars. It should read 1.
But the most crucial point is that the data.frame df already is in long format and doesn't need to be reshaped.

So, df can be used directly in dcast():

library(reshape2)
dcast(df[!is.na(df$value), ], name ~ value, length, drop = FALSE)
#    name Google Yahoo
#1 stockA      1     1
#2 stockB      0     0
#3 stockC      1     0

In order to avoid a third NA column appearing in the result, the NA rows have to be filtered out of df before reshaping. On the other hand, drop = FALSE is required to ensure stockB is included in the result.

Data

df <- data.frame(name = c("stockA", "stockA", "stockB", "stockC"), 
                 value = c("Google", "Yahoo", NA, "Google"))
df
#    name  value
#1 stockA Google
#2 stockA  Yahoo
#3 stockB   <NA>
#4 stockC Google

score 0 · Answer 3 · answered Mar 21 '17 at 20:23

0

Try df2 <- dcast(melt(df, 1:2, na.rm = TRUE), name ~ value, length)

Just remove df + from the equation.

Though this will give you an extra column for NA values, which makes me think the na.rm argument isn't working properly in your formulation.

answered Mar 21 '17 at 20:23

Patrick Williams

694
8
22

thank you for your answer. I tried your suggestion but I receive Using value as value column: use value.var to override.` – Keri Mar 21 '17 at 20:32
My data look like this: `df <- data.frame(name=c('stockA','stockA','stockB','stockC'),value=c('Google','Yahoo',NA,'Google'))' but I'm not sure what format yours are because you have the values separated by commas. – Patrick Williams Mar 21 '17 at 20:33
It is from my csv. It is a dataframe – Keri Mar 21 '17 at 20:37
length it is not a column – Keri Mar 21 '17 at 20:37
I just upvoted the other answer. It does what you need, but you must first install tidyr – Patrick Williams Mar 21 '17 at 20:43
1

@Patrick Williams There are some more issues with OP's code besides the misplaced `df +`. Another point is that both columns (`1:2`) are used as `id.vars` which perhaps is causing the issue with `no.rm` not being honored. Finally, `melt()` isn't required at all because `df` already is in long format. – Uwe Mar 22 '17 at 20:43

score 0 · Answer 4 · answered Mar 21 '17 at 21:19

You can do it also with base R:

df <- read.table(header=TRUE, sep=',', text=
'name, value
stockA,Google
stockA,Yahoo
stockB,NA
stockC,Google')
xtabs(~., data=df)
#        value
#name     Google Yahoo
#  stockA      1     1
#  stockB      0     0
#  stockC      1     0

Grouping with numeric variables

4 Answers4

Data