1

I have the following query:

library(dplyr)
FinalQueryDplyr <- PostsWithFavorite %>%
  inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
  select(DisplayName, Age, Location, FavoriteTotal, MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
  select(-c(OwnerUserId)) %>%
  arrange(desc(FavoriteTotal))

As you can see, I use the OwnerUserId column as the joining column between 2 data frames.

I want the result data frame to only have other columns, without the OwnerUserId column visible.

Even though I 'deselect' the OwnerUserId column 2 times in said query:

  • once by not including it in the first select clause
  • once by explicitly deselecting it with select(-c(OwnerUserId))

It is still visible in the result: OwnerUserId DisplayName Age Location FavoriteTotal MostFavoriteQuestion MostFavoriteQuestionLikes

How can I get rid of the column that was used as a joining column in dplyr?

  • 4
    Without being able to work with your data, it's pretty hard to know what's going on. A [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would be helpful. Can't do any more than guess, but is the data frame grouped by that column? – camille Apr 18 '20 at 17:28
  • 2
    Have you tried `ungroup`ing? That's the most probable reason I can think of. If a var is a grouping var, dyplr will not (de-)select it. – stefan Apr 18 '20 at 18:05
  • Thank you. That was exactly it. ungroup() was the answer. – Alicja Barankiewicz Apr 18 '20 at 18:08

1 Answers1

1

One option is to remove the attribute by converting to data.frame

library(dplyr)
PostsWithFavorite %>%
   inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
   select(DisplayName, Age, Location, FavoriteTotal, 
          MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
   as.data.frame %>%
   select(-c(OwnerUserId)) %>%
   arrange(desc(FavoriteTotal))
akrun
  • 874,273
  • 37
  • 540
  • 662