1

I have two data frames

>cat a1.txt 
"501" 5.7916 6.9861 
"502" 24.9444 18.45
"503" 4 4.7222 5.5 
"505" 5 5.2777 5.3

>cat a2.txt
501 "alex"
502 "brian"
503 "romeo"
504 "tango"
505 "zee"

I want to be able to replace the first column in a1.txt, with corresponding value from a2.txt(lookup)

I want something like-

alex 5.7916 6.9861 
brian 24.9444 18.45
romeo 4 4.7222 5.5 
zee 5 5.2777 5.3

I tried-

a1t <- read.table('a1.txt')
a2t <- read.table('a2.txt')

a1t

   V1      V2      V3
1 501  5.7916  6.9861
2 502 24.9444 18.4500
3 503  4.0000  4.7222
4 505  5.0000  5.2777
> a2t
  V1    V2
1 501  alex
2 502 brian
3 503 romeo
4 504 tango
5 505   zee
> merge(x=a1t, y=a2t,by='V1', all.x=TRUE)
   V1    V2.x      V3  V2.y
 1 501  5.7916  6.9861  alex
 2 502 24.9444 18.4500 brian
 3 503  4.0000  4.7222 romeo
 4 505  5.0000  5.2777   zee

But this does not replace the 1st column. It adds one extra column. How can I get the above mentioned desired format?

What if my a1.txt is unbalanced? i.e. the number of columns in it are not consistent in all rows?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
blue01
  • 2,035
  • 2
  • 23
  • 38
  • You can use `match` or convert it to` factor` and replace the `levels`. – user1436187 Jan 22 '14 at 02:55
  • for reference: Here is a [question that has many ways to do this in R](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right). These include the following functions/packages: `merge(), plyr, data.table, sqldf` – marbel Jan 22 '14 at 03:06

1 Answers1

2

You can just select what you want:

#you are getting all lines and columns 4, 2 and 3    
merge(x=a1t, y=a2t,by='V1', all.x=TRUE)[,c(4,2,3)] 

#this will give the data.frame you wanted, that is:

    V2.y    V2.x      V3
1  alex  5.7916  6.9861
2 brian 24.9444 18.4500
3 romeo  4.0000  4.7222
4   zee  5.0000  5.2777

Or if you invert the merge, you can just exclude the first column:

merge(x=a2t, y=a1t,by='V1', all.y=TRUE)[,-c(1)]

##This will give:
 V2.x    V2.y      V3
1  alex  5.7916  6.9861
2 brian 24.9444 18.4500
3 romeo  4.0000  4.7222
4   zee  5.0000  5.2777

You ask:

What if my a1.txt is unbalanced? i.e. the number of columns in it are not consistent in all rows?

I am not sure what you mean, but if you mean that you do not have some observations of some variables from some people, just add NA.

Carlos Cinelli
  • 11,354
  • 9
  • 43
  • 66
  • I cannot do this- [,c(4,2,3)] Since the list of columns varies and it is a long list. The data is something like - Alex's columns contain the price of his 10 most expensive toys. Brian's top 5 most expensive toys' prices. I hope this also clarifies the unbalanced part. – blue01 Jan 23 '14 at 01:20
  • An easy way would be to invert and then just exclude the first column, `merge(x=a2t, y=a1t,by='V1', all.y=TRUE)[,-c(1)]`. – Carlos Cinelli Jan 23 '14 at 12:50
  • And if you want to keep all elements (the unbalanced case) you should use all=TRUE, that will give you NA's where you do not have a match. – Carlos Cinelli Jan 23 '14 at 12:55