Adding a column to a dataframe based on labels from a different dataframe

Question

I'm quite new to R and I hope somebody can help me with this problem. There're two dataframes I am working on. First one:

> print(averagetable)
   Group.1     Moving   Feeding  Standing classification
1 cluster1 0.04978355 0.1470238 0.7795848 Moving/Feeding
2 cluster2 0.08214286 0.3216518 0.5642857 Feeding/Moving
3 cluster3 0.03750000 0.1462121 0.7922980       Standing

head() sample of the second one:

> head(tableresults)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z Vigilance Head-up Grazing Browsing Moving Grooming Resting
1         19         21         28         1       0       0        0      0        0       0
2         20         14         24         1       0       0        0      0        0       0
3         34         35         49         1       0       0        0      0        0       0
4         18          5         19         1       0       0        0      0        0       0
5         23         27         35         1       0       0        0      0        0       0
6         33         20         39         1       0       0        0      0        0       0
  Fleeing Total     Event winning_cluster
1       0     1 Vigilance        cluster3
2       0    80 Vigilance        cluster3
3       0    80 Vigilance        cluster3
4       0    80 Vigilance        cluster1
5       0    80 Vigilance        cluster3
6       0    80 Vigilance        cluster3

I would like to add a column tableresults$classification containing the categories of averagetable$classification depending on the cluster name in tableresults$winning_cluster.

The labels for each cluster are summarized in averagetable$Group.1 and averagetable$classification. The rest of the columns in both dataframes don't have a particular importance on the final output.

A head() sample of the final output would be:

> head(tableresults)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z Vigilance Head-up Grazing Browsing Moving Grooming Resting
1         19         21         28         1       0       0        0      0        0       0
2         20         14         24         1       0       0        0      0        0       0
3         34         35         49         1       0       0        0      0        0       0
4         18          5         19         1       0       0        0      0        0       0
5         23         27         35         1       0       0        0      0        0       0
6         33         20         39         1       0       0        0      0        0       0
  Fleeing Total     Event winning_cluster classification
1       0     1 Vigilance        cluster3  Standing
2       0    80 Vigilance        cluster3  Standing
3       0    80 Vigilance        cluster3  Standing
4       0    80 Vigilance        cluster1  Moving/Feeding
5       0    80 Vigilance        cluster3  Standing
6       0    80 Vigilance        cluster3  Standing

This is a bit confusing to me so I hope somebody can help me. Any input is appreciated!

score 1 · Accepted Answer · answered Mar 25 '19 at 14:57

1

library(dplyr)

tableresults %>%
  inner_join(averagetable %>% select(Group.1,classification),by = c("winning_cluster" = "Group.1"))

answered Mar 25 '19 at 14:57

Wil

3,076
2
12
31

score 0 · Answer 2 · answered Mar 25 '19 at 15:02

With dplyr you can do a join:

library(dplyr) df_final = tableresults%>% left_join(averagetable, by =c('winning_cluster' = ' Group.1')%>% # does the join select(-Moving, - Feeding, -Standing) #unselect uncalled columns

If there are winning groups in the table_results table that do not have a classification in the averagetable, it will populate with null.

score 0 · Answer 3 · answered Mar 25 '19 at 15:04

0

A base R approach:

merge(tableresults, averagetable[,c(1,5)], 
      by.x="winning_cluster", by.y="Group.1", all.x=TRUE)

answered Mar 25 '19 at 15:04

emilliman5

5,816
3
27
37

Adding a column to a dataframe based on labels from a different dataframe

3 Answers3