Replacing row elements in a dataframe based on values from another dataframe

Question

I'm fairly new to R so I hope somebody can help me. An output table in one of my scripts is the averagetable below showing different proportions of the event Standing in three different clusters:

> print(averagetable)
   Group.1  Standing
1 cluster1  0.5642857
2 cluster2  0.7795848
3 cluster3  0.7922980

Note that R can assign different cluster names (cluster1, cluster2 or cluster3) to the values on averagetable$Standing each time I'm running the scrip. Another output can be:

> print(averagetable)
   Group.1 Standing
1 cluster1 0.7795848
2 cluster2 0.5642857
3 cluster3 0.7922980

On the other hand, my script produces the tableresults dataframe. Please find a head() sample below:

> head(tableresults)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        cluster3
2         20         14         24        cluster3
3         34         35         49        cluster3
4         18          5         19        cluster2
5         23         27         35        cluster3
6         33         20         39        cluster3

My question is fairly simple. I would like to transform the data in tableresults changing the string in the column winning_cluster based on three rules:

1) Write Standing in tableresults$wining_cluster replacing it by the cluster name having the highest Standing value in averagetable.

2) Write Moving/Feeding in tableresults$wining_cluster replacing it by the cluster name having the second highest Standing value in averagetable.

3) Write Feeding/Moving in tableresults$wining_cluster replacing it by the cluster name having the third highest Standing value in averagetable.

In other words, this is the output desired:

> head(tableresults_output)
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        Standing
2         20         14         24        Standing
3         34         35         49        Standing
4         18          5         19        Moving/Feeding
5         23         27         35        Standing
6         33         20         39        Standing

Note that it is very important to have a value-based, hierarchical component that will assign conditions 1) 2) or 3) depending on averagetable values. This is not solved by using:

averagetable$classification <- factor(x = as.character(sort(averagetable$Standing)),
                labels = c('Feeding/Moving', 'Moving/Feeding','Standing'))

With this command Standing will be always linked to cluster1, Moving/Feeding to cluster2 and Feeding/Moving to cluster3 and that is not necessarily true when averagetable is regenerated.

Anyways, any help is appreciated and I hope my question was interesting enough for the forum.

Neither `gsub` nor `tail` are meant for something like this. Do the `ACTIVITY_*` fields actually mean anything in the context of this question? When you say *"highest value"*, to which value are you referring? You have three values within `averagetable` per cluster, and three values per row in `tableresults`, I have no idea which you mean. — r2evans, Mar 25 '19 at 12:54
This might be better solved by merging these `data.frame` s although I've still failed to understand the question. — NelsonGon, Mar 25 '19 at 12:54
@NelsonGon I've reformulated the question on my original post. Hope things are much more clear now. — juancda4, Mar 25 '19 at 13:00
You still haven't explained what `ACTIVITY_*` has to do with anything. And you used *"highest Standing value in averagetable"* twice in your logic. — r2evans, Mar 25 '19 at 13:10
Hello. I've fixed the "highest Standing value in averagetable" issue in the original post. `ACTIVITY_*` values are unimportant. Thanks for pointing that out. — juancda4, Mar 25 '19 at 13:12

score 0 · Answer 1 · answered Mar 25 '19 at 13:00

0

Just create your classification in the first data.frame then merge with your tableresults

averagetable$classification <- factor(x = as.character(sort(averagetable$Standing)),
                            labels = c('Feeding/Moving', 'Moving/Feeding', 'Standing'))

        Group.1     Moving   Feeding  Standing classification
1: cluster1 0.08214286 0.3216518 0.5642857 Feeding/Moving
2: cluster2 0.04978355 0.1470238 0.7795848 Moving/Feeding
3: cluster3 0.03750000 0.1462121 0.7922980       Standing

merge(tableresults, 
      averagetable[, c('Group.1', 'classification')],
      by.x = 'winning_cluster', by.y = 'Group.1', all.x = T)

   winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z classification
1:        cluster2         18          5         19 Moving/Feeding
2:        cluster3         19         21         28       Standing
3:        cluster3         20         14         24       Standing
4:        cluster3         34         35         49       Standing
5:        cluster3         23         27         35       Standing
6:        cluster3         33         20         39       Standing

answered Mar 25 '19 at 13:00

Felipe Alvarenga

2,572
1
17
36

Thank you for your suggestion. However, I need to have a some sort of hierarchical component based on the values on `averagetable`. I've several different dataframes I need to transform. Please refer back to my original post as I've edited some features to make it more understandable. – juancda4 Mar 25 '19 at 13:02
the `sort` inside the factor command does the hierarchy. Once you merge both, the hierarchy is based on the averagetable, given that the `classification` is created there – Felipe Alvarenga Mar 25 '19 at 13:05
I've clarified that aspect on my original post again. `averagetable` is an output table coming from a previous command. That means that R might assign a different `cluster#` names to the `Standing` values every time I run my script. Therefore, I need a command line that will look at the values in the column `Standing` and provide a `classification` acording to the 1) 2) and 3) rules on my original post – juancda4 Mar 25 '19 at 13:33
That is exactly what my command is doing. It is ordering the values by the `Standing` column. Unless your averagetable is not cluster specific, or has more than three clusters, this command is what you are asking. If not, change the question. – Felipe Alvarenga Mar 25 '19 at 13:54
What do you mean by the averagetable being not cluster specific? Like I mentionned earlier, your code works very well for one eventual `averagetable` output, but not when I regenerate `averagetable` and the values of `Standing` might be classified to a different cluster in `averagetable$Group1`. I'm also getting an undesired output when using your `merge()` function as the column `winning_cluster` becomes "all `cluster1`": – juancda4 Mar 25 '19 at 14:59
`> resi<-merge(tableresults,averagetable[, c('Group.1', 'classification')],by.x = 'winning_cluster', by.y = 'Group.1', all.x = T) > head(resi) winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z classification 1 cluster1 4 8 9 Moving/Feeding 2 cluster1 0 0 0 Moving/Feeding 3 cluster1 16 3 16 Moving/Feeding 4 cluster1 0 0 0 Moving/Feeding 5 cluster1 11 19 22 Moving/Feeding ` – juancda4 Mar 25 '19 at 15:00
your have to re-create the `classification` column everytime your `averagetable` changes. In my code, everytime `Standing` changes, and you rerun the command that creates the `classification` column, it changes accordingly – Felipe Alvarenga Mar 25 '19 at 15:02

r2evans · Accepted Answer · 2019-03-25T13:56:06.107

0

Here's a stab:


tableresults <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
  ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1         19         21         28        cluster3
2         20         14         24        cluster3
3         34         35         49        cluster3
4         18          5         19        cluster2
5         23         27         35        cluster3
6         33         20         39        cluster3")

averagetable <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
   Group.1  Standing
1 cluster1  0.5642857
2 cluster2  0.7795848
3 cluster3  0.7922980")

averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ rank(-averagetable$Standing) ]
merge(tableresults, averagetable[,c(1,3)], by.x="winning_cluster", by.y="Group.1")
#   winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z              x
# 1        cluster2         18          5         19 Moving/Feeding
# 2        cluster3         19         21         28       Standing
# 3        cluster3         20         14         24       Standing
# 4        cluster3         34         35         49       Standing
# 5        cluster3         23         27         35       Standing
# 6        cluster3         33         20         39       Standing

edited Mar 25 '19 at 13:56

answered Mar 25 '19 at 13:12

r2evans

141,215
6
77
149

I've clarified that aspect on my original post again. `averagetable` is an output table coming from a previous command. That means that R might assign a different `cluster#` names to the `Standing` values every time I run my script. Therefore, I need a command line that will look at the values in the column `Standing` and provide a `classification` acording to the 1) 2) and 3) rules on my original post – juancda4 Mar 25 '19 at 13:37
1

If you regenerate `averagetable`, just recalculate `$x`, is it not that straightforward? – r2evans Mar 25 '19 at 13:39
With the following regenerated `averagetable`, see below the output of your code:`> print(averagetable) Group.1 Moving Feeding Standing 1 cluster1 0.04978355 0.1470238 0.7795848 2 cluster2 0.08214286 0.3216518 0.5642857 3 cluster3 0.03750000 0.1462121 0.7922980` – juancda4 Mar 25 '19 at 13:49
`> averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ order(averagetable$Standing) ] > averagetable Group.1 Moving Feeding Standing x 1 cluster1 0.04978355 0.1470238 0.7795848 Moving/Feeding 2 cluster2 0.08214286 0.3216518 0.5642857 Standing 3 cluster3 0.03750000 0.1462121 0.7922980 Feeding/Moving`. In this case, `Standing` was assigned to the lowest value of `averagetable$Standing`, `Feeding/Moving` to the highest etc. Please let me know if you have any ideas on how to rescript this issue every time I'm regenerating `averagetable`. – juancda4 Mar 25 '19 at 13:49
Oops, replace `order(...)` with `rank(-averagetable$Standing)`. – r2evans Mar 25 '19 at 13:53
1

Now it worked. The following command was needed: `averagetable$x <- c("Feeding/Moving", "Moving/Feeding", "Standing")[ order(averagetable$Standing) ] averagetable` instead of `averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ order(averagetable$Standing) ] averagetable` Thank you for your help. – juancda4 Mar 25 '19 at 13:54

Replacing row elements in a dataframe based on values from another dataframe

2 Answers2

Linked