Merge/replace row name with colmn value from another data frame

Question

I have two data frames

>cat a1.txt 
"501" 5.7916 6.9861 
"502" 24.9444 18.45
"503" 4 4.7222 5.5 
"505" 5 5.2777 5.3

>cat a2.txt
501 "alex"
502 "brian"
503 "romeo"
504 "tango"
505 "zee"

I want to be able to replace the first column in a1.txt, with corresponding value from a2.txt(lookup)

I want something like-

alex 5.7916 6.9861 
brian 24.9444 18.45
romeo 4 4.7222 5.5 
zee 5 5.2777 5.3

I tried-

a1t <- read.table('a1.txt')
a2t <- read.table('a2.txt')

a1t

   V1      V2      V3
1 501  5.7916  6.9861
2 502 24.9444 18.4500
3 503  4.0000  4.7222
4 505  5.0000  5.2777
> a2t
  V1    V2
1 501  alex
2 502 brian
3 503 romeo
4 504 tango
5 505   zee
> merge(x=a1t, y=a2t,by='V1', all.x=TRUE)
   V1    V2.x      V3  V2.y
 1 501  5.7916  6.9861  alex
 2 502 24.9444 18.4500 brian
 3 503  4.0000  4.7222 romeo
 4 505  5.0000  5.2777   zee

But this does not replace the 1st column. It adds one extra column. How can I get the above mentioned desired format?

What if my a1.txt is unbalanced? i.e. the number of columns in it are not consistent in all rows?

You can use `match` or convert it to` factor` and replace the `levels`. — user1436187, Jan 22 '14 at 02:55
for reference: Here is a [question that has many ways to do this in R](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right). These include the following functions/packages: `merge(), plyr, data.table, sqldf` — marbel, Jan 22 '14 at 03:06

Carlos Cinelli · Accepted Answer · 2014-01-23T12:52:34.717

2

You can just select what you want:

#you are getting all lines and columns 4, 2 and 3    
merge(x=a1t, y=a2t,by='V1', all.x=TRUE)[,c(4,2,3)] 

#this will give the data.frame you wanted, that is:

    V2.y    V2.x      V3
1  alex  5.7916  6.9861
2 brian 24.9444 18.4500
3 romeo  4.0000  4.7222
4   zee  5.0000  5.2777

Or if you invert the merge, you can just exclude the first column:

merge(x=a2t, y=a1t,by='V1', all.y=TRUE)[,-c(1)]

##This will give:
 V2.x    V2.y      V3
1  alex  5.7916  6.9861
2 brian 24.9444 18.4500
3 romeo  4.0000  4.7222
4   zee  5.0000  5.2777

You ask:

What if my a1.txt is unbalanced? i.e. the number of columns in it are not consistent in all rows?

I am not sure what you mean, but if you mean that you do not have some observations of some variables from some people, just add NA.

edited Jan 23 '14 at 12:52

answered Jan 22 '14 at 02:57

Carlos Cinelli

11,354
9
43
66

I cannot do this- [,c(4,2,3)] Since the list of columns varies and it is a long list. The data is something like - Alex's columns contain the price of his 10 most expensive toys. Brian's top 5 most expensive toys' prices. I hope this also clarifies the unbalanced part. – blue01 Jan 23 '14 at 01:20
An easy way would be to invert and then just exclude the first column, `merge(x=a2t, y=a1t,by='V1', all.y=TRUE)[,-c(1)]`. – Carlos Cinelli Jan 23 '14 at 12:50
And if you want to keep all elements (the unbalanced case) you should use all=TRUE, that will give you NA's where you do not have a match. – Carlos Cinelli Jan 23 '14 at 12:55

Merge/replace row name with colmn value from another data frame

1 Answers1