0

I have two data frames that look like this.

First one:

  head(df_2015_2016)
  Date    HomeTeam       AwayTeam B365H B365D B365A  BWH  BWD  BWA  IWH IWD  IWA
  1 08/08/15 Bournemouth    Aston Villa  2.00   3.6  4.00 2.00 3.30 3.70 2.10 3.3 3.30
  2 08/08/15     Chelsea        Swansea  1.36   5.0 11.00 1.40 4.75 9.00 1.33 4.8 8.30
  3 08/08/15     Everton        Watford  1.70   3.9  5.50 1.70 3.50 5.00 1.70 3.6 4.70
  4 08/08/15   Leicester     Sunderland  1.95   3.5  4.33 2.00 3.30 3.75 2.00 3.3 3.60
  5 08/08/15  Man United      Tottenham  1.65   4.0  6.00 1.65 4.00 5.50 1.65 3.6 5.10
  6 08/08/15     Norwich Crystal Palace  2.55   3.3  3.00 2.60 3.20 2.70 2.40 3.2 2.85

And the second one

 > head(df_matches)
 row_names  ID scoresway_id               club      club_bet      city
 1         1 242          214               Gent          Gent      Gent
 2         2 248          215         Anderlecht    Anderlecht Bruxelles
 3         3 243          217      Cercle Brugge Cercle Brugge    Brugge
 4         4 310          218 Sporting Charleroi     Charleroi Charleroi
 5         5 249          219        Club Brugge   Club Brugge    Brugge
 6         6 234          222          Beerschot          #N/B   Antwerp

Now I would like to merge them. The df that I try to merge has 5062 rows

 nrow(df_2015_2016)
 [1] 5062

However, when I try to merge it

 df <- merge(df_2015_2016, df_matches,  by.x = "HomeTeam", by.y = "club_bet", all.x = T)

The endresult has 5733 rows.

nrow(df)
[1] 5733

The output that I want is just 5062 rows with a match or NA value is case there is no match.

Any feedback on what goes wrong here?

lmo
  • 37,904
  • 9
  • 56
  • 69
Frits Verstraten
  • 2,049
  • 7
  • 22
  • 41
  • 1
    [One-to-many](https://en.wikipedia.org/wiki/One-to-many_(data_model)) relationship maybe? – zx8754 Apr 18 '16 at 11:56
  • 2
    [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – zx8754 Apr 18 '16 at 11:58
  • 2
    You can check `length(unique(df_matches$club_bet))` to make sure there are duplicate rows of data. That will generate more rows when you merge. A left join will product your 'expected' output only when the right side table has no duplicates. – Gopala Apr 18 '16 at 12:12

0 Answers0