Using dplyr to gather specific dummy variables

Question

This question is the extension of (Using dplyr to gather dummy variables) .

The question: How can I gather only a few columns, instead of the whole dataset? So in this example, I want to gather all the columns, but except "sedan". My real data set has 250 columns, so therefore it will be great if I can include/exclude the columns by name.

Data set

head(type)
x    convertible coupe hatchback sedan wagon
1           0     0         0     1     0
2           0     1         0     0     0
3           1     0         0     0     0
4           1     0         0     0     0
5           1     0         0     0     0
6           1     0         0     0     0

Output

TypeOfCar
1     x
2     coupe 
3     convertible
4     convertible
5     convertible
6     convertible

Is a possibility, but it is not desired, since it will cost more memory again :-). There should be a way to do this on a Tidy way, right? — R overflow, Oct 26 '18 at 09:56
I tried: test <- df %>% gather(new_column, Count, 2:4) But with wrong results... — R overflow, Oct 26 '18 at 10:05

score 2 · Answer 1 · answered Oct 26 '18 at 10:39

2

Not sure if i'm understanding you, but you can do what you want:

df %>% select(-sedan) %>%  gather(Key, Value)

And if you have to much variables you can use:

select(-contains(""))
select(-start_wi(""))
select(-ends_with(""))

Hope it helps.

answered Oct 26 '18 at 10:39

Carlos Vecina Tebar

360
1
6

Thanks! But when I run it, in the example above.. then the result is slightly different (data frame with 2 columns). I tried to rewrite your code in the example above, but that didn't gave the desired result (since the rows were shuffled). what i tried: xj <- dat %>% select(-sedan) %>% gather(Key, Value) – R overflow Oct 29 '18 at 14:56

score 1 · Answer 2 · answered Oct 26 '18 at 08:52

1

You can use -sedan in gather:

dat %>% gather(TypeOfCar, Count, -sedan) %>% filter(Count >= 1) %>% select(TypeOfCar)
#      TypeOfCar
# 1 convertible
# 2 convertible
# 3 convertible
# 4 convertible
# 5       coupe

Data:

tt <- "convertible coupe hatchback sedan wagon
1           0     0         0     1     0
2           0     1         0     0     0
3           1     0         0     0     0
4           1     0         0     0     0
5           1     0         0     0     0
6           1     0         0     0     0"

dat <- read.table(text = tt, header = T)

answered Oct 26 '18 at 08:52

RLave

8,144
3
21
37

Thanks! But how can I use it to select for example multiple columns (let's say only gathering for (convertible,coupe and hatchback)? (so selecting instead of removing 1 variable) – R overflow Oct 26 '18 at 08:53
1

`gather(TypeOfCar, Count, convertible,coupe,hatchback)` works for you? – RLave Oct 26 '18 at 08:55
1

You could use this sintax in order to get multiple consecutive columns `start_column:end_column` or use `contains("X")` in order to match a string – RLave Oct 26 '18 at 08:56
I totally misused the function. Thanks for the clarity, really helped me out! – R overflow Oct 26 '18 at 08:59
I see, after testing, that only the first name column name is showed. Should I transform my data in a specific format? It is now numeric. – R overflow Oct 26 '18 at 09:08
remove the `%>% filter(Count >= 1) %>% select(TypeOfCar)` part, that was just to get a shorter example. – RLave Oct 26 '18 at 09:15
still same error: I use now: test <- df %>% gather(df, Count, 4:27) #need to use column 4 until 27 – R overflow Oct 26 '18 at 09:20
You don't need to rewrite `df` inside the call if you use the `%>%`, instead write a variable name like `TypeOfCar` this would be the key to the `gather` function – RLave Oct 26 '18 at 09:22
test <- df %>% gather(key, Count, 4:27) Result in two new columns (key, which is now representing all the type of cars, but not on the good rows, and Count, which is binary (33051 zeros, and 1437 ones). But the gathering part is still going wrong (not the column names are represented where an 1 occurred) – R overflow Oct 26 '18 at 09:27
I see the problem, I think. You should probably use `%>% select(key, count)` after the `gather` – RLave Oct 26 '18 at 09:36
Thanks, but that will lead to a data frame with much more rows (containing key and Count (I assume that Count is with capital c?). Code used: test <- df %>% gather(key, Count, 4:27) %>% select(key,Count) – R overflow Oct 26 '18 at 09:41
Thanks! But the answer in your example is not correct, right? It should be: 1 = x 2 = coupe 3 = convertible 4 = convertible 5 = convertible 6 = convertible – R overflow Oct 29 '18 at 14:41

score 0 · Answer 3 · answered Oct 29 '18 at 15:17

Fixed it with a combination of @RLave and @Carlos Vecina

right_columns <- all_data %>% select(starts_with("hour"))

all_data$all_hour <-data.frame(new_column = names(right_columns )[as.matrix(right_columns )%*%seq_along(right_columns )],stringsAsFactors=FALSE)

Using dplyr to gather specific dummy variables

3 Answers3