0

I am working with two different data sets, and I'd like to move data from one to the other. I'm thinking of it this way: One contains the results, paired with the correct factor (HTm), and I want to spread those out over another frame. Here is the first frame:

    head(five)
    Week      Game.ID      VTm VPts HTm HPts HDifferential VDifferential
  1 1  NFL_20050908_OAK@NE OAK   20  NE   30      10           -10
  2 1 NFL_20050911_ARI@NYG ARI   19 NYG   42            23           -23
  3 1 NFL_20050911_CHI@WAS CHI    7 WAS    9             2            -2
  4 1 NFL_20050911_CIN@CLE CIN   27 CLE   13           -14            14
  5 1  NFL_20050911_DAL@SD DAL   28  SD   24            -4             4
  6 1 NFL_20050911_DEN@MIA DEN   10 MIA   34            24           -24
    VTm.f HTm.f average
  1 OAK    NE 19.4375
  2 ARI   NYG 19.4375
  3 CHI   WAS 19.4375
  4 CIN   CLE 19.4375
  5 DAL    SD 19.4375
  6 DEN   MIA 19.4375

    > tail(five)
        Week              Game.ID VTm VPts HTm HPts HDifferential VDiff
    262   19 NFL_20060114_WAS@SEA WAS   10 SEA   20            10    -10
    263   19 NFL_20060115_CAR@CHI CAR   29 CHI   21            -8      8
    264   19 NFL_20060115_PIT@IND PIT   21 IND   18            -3      3
    265   20 NFL_20060122_CAR@SEA CAR   14 SEA   34            20    -20
    266   20 NFL_20060122_PIT@DEN PIT   34 DEN   17           -17     17
    267   21 NFL_20060205_SEA@PIT SEA   10 PIT   21            11    -11
            VTm.f HTm.f average
      262   WAS   SEA       0
      263   CAR   CHI       0
      264   PIT   IND       0
      265   CAR   SEA       0
      266   PIT   DEN       0
      267   SEA   PIT       0

and here is the other (aggregated means from the first frame).

    head(fiveINFO)
    HTm     HPts     VPts  average
   1 ARI 19.87500 19.00000 19.43750
   2 ATL 24.75000 19.12500 21.93750
   3 BAL 19.37500 13.75000 16.56250
   4 BUF 16.50000 17.37500 16.93750
   5 CAR 25.12500 23.27273 24.19886
   6 CHI 18.77778 14.00000 16.38889

    tail(fiveINFO)
    VTm  HPts   VPts average
 27 SEA 21.00 25.000 23.0000
 28 SF 30.75 12.625 21.6875
 29 STL 28.00 22.000 25.0000
 30 TB 15.75 15.375 15.5625
 31 TEN 28.00 14.750 21.3750
 32 WAS 20.60 18.800 19.7000

For reference, this data is looking at NFL scores. I want to take the averages in fiveINFO, frame two, and move them to the corresponding team in the first frame. five is 266 rows long, while fiveINFO is 32 rows — fiveINFO contains each HTm only once, while five contains each one 8-10 times, depending on the number of home games each team plays. I found several answers that seemed similar, but with much smaller data sets. I don't want to merge the two; I want the averages data from the second frame to be spread across the appropriate HTm values in the first frame.

I'm imagining I'll need to use some kind of for loop for this, but everything I'm doing is striking out. Help?

daniel_1
  • 11
  • 3
  • 1
    I am struggling to understand why `merge()` doesn't do exactly what you want. What you seem to say you want is precisely what that function will do. – joran May 11 '12 at 01:58
  • So something like this is NOT what you want? I've recreated your data: `five <- data.frame(HTm=c("OAK","OAK","ARI","ARI"));fiveINFO <- data.frame(HTm=c("OAK","ARI"),averages=1:2);merge(five,fiveINFO,by="HTm");` – thelatemail May 11 '12 at 01:59
  • when I used merge to try to get the data to spread for the correct HTm, all of them showed up as "ARI" and took ARI's average of 19.4375. in retrospect I probably just wrote the code incorrectly, as follows: merge(five, fiveINFO[, c("HTm", "average")]) – daniel_1 May 11 '12 at 02:08
  • in the end, I want to keep everything the same on five, but swap out the default value for "average" for the correct value, as determined by fiveINFO. – daniel_1 May 11 '12 at 02:09
  • If you provided a reproducible example I'm certain I could solve this problem with merge. – joran May 11 '12 at 02:12
  • 3
    You need to explicitly specify `by="HTm"` in the arguments of `merge`. – Vincent Zoonekynd May 11 '12 at 02:13
  • where would I specify by="HTm"? @joran, I can post more data, if that's what you're saying. – daniel_1 May 11 '12 at 02:19
  • I edited the post to include tails. Perhaps that will help? I'm not clear on what I should be doing, but I appreciate all the help! – daniel_1 May 11 '12 at 02:32
  • Your additions are not helpful. Read [this](http://stackoverflow.com/q/5963269/324364) question and then update your answer with a _reproducible_ example. – joran May 11 '12 at 03:11

1 Answers1

1

total<-merge(five, fiveINFO, by="HTm")

where total is the data frame that has the merged columns from five and fiveINFO based on htm column. The value of htm that do not match in five and fiveINFO will not be filled. But, if you want that filled with NA, you can do so explicitly ( use this option in merge function: all=TRUE, all.x or all.y = TRUE).

You can also remove extra columns that you don't want after merging.

total=subset(total,select= -c(HPts,VPts)) #for removing columns HPts, VPts from the merged data-frame

Subs
  • 529
  • 2
  • 9
  • Thanks for the help! Quick question: when I tried that code, I got the following error: Error in fix.by(by.y, y) : 'by' must specify valid column(s) Any idea how I could fix that? They are definitely valid columns... – daniel_1 May 11 '12 at 02:38
  • make sure that the column name is "HTm" in both dataframes - five and fiveINFO. R is case sensitive. or if you don't have any common column name in both the datasets then use (..., by.x="columnname in dataframe five", by.y="columnname in data-frame fiveINFO") – Subs May 11 '12 at 02:45
  • Is "HTm" and "VTm" two seperate columns in the 2nd dataframe fiveINFO? Your head and tail information doesn't match with column name for the 2nd dataframe. Make sure to use correct column name while merging. (if both dataframes contain same columnname use `by` or else use `by.x` and `by.y` for merging) – Subs May 11 '12 at 03:26