1

I'm trying to merge two data frames on three columns where 2 columns are exact and one has a variance of +/- 5%. I'd like to merge on 2 factors + 1 numeric.

Here's the code I've used so far:

`matched <- merge(x = plans, y = sales, by = c("Subdivision","Builder","SquareFootage"), all.y = TRUE)`

And I've tried to match with JRip on these variables. What I want is to match the plan_name field from plans to the individual records on the sales data frame, with an exact match on subdivision and builder, and a match on square_footage within maybe 10%.

Here's a sample of the plans dataframe:

+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Subdivision           | Builder             | Plan_name       | Plan_label                        | SquareFootage |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Barton Woods          | Anglia Homes        | 60-Bristol      | Anglia Homes|60-Bristol           | 2821          |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Imperial Oaks/Meadows | Ashton Woods        | Tyler           | Ashton Woods|Tyler                | 2701          |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Harper's Preserve     | Brighton Homes      | 65-Stewart Alt. | Brighton Homes|65-Stewart Alt.    | 2811          |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Harmony               | CastleRock Builders | 65-Santa Rosa   | CastleRock Builders|65-Santa Rosa | 3304          |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+
| Stewart's Forest      | Century Communities | 80-Marsala      | Century Communities|80-Marsala    | 3741          |
+-----------------------+---------------------+-----------------+-----------------------------------+---------------+

Here's a sample of the sales table:

+-----------------------+--------------------------+---------------+---------------+
| Subdivision           | Builder                  | SquareFootage | Parcel#       |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Ashton Woods Homes       | 2570          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Ashton Woods Homes       | 3355          | 7167-02-05200 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Ashton Woods Homes       | 3611          | 7167-02-03100 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Ashton Woods Homes       | 2652          | 7167-02-00900 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2570          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2975          | 7167-00-02600 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2755          | 7167-00-02700 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2205          | 7167-00-03800 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2570          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2960          | 7167-00-02900 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2973          | 7167-00-03900 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2707          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2707          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2570          | 0334-00-00117 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2646          | 7167-00-01300 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2061          | 7167-00-00500 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2672          | 7167-00-03700 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2498          | 7167-00-00900 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2227          | 7167-00-02300 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2578          | 7167-00-02000 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2455          | 7167-00-03000 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Beazer Homes             | 2530          | 7167-00-02200 |
+-----------------------+--------------------------+---------------+---------------+
| Stewarts Forest       | Century Communities, Inc | 3816          | 9024-08-02700 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | J Patrick Homes          | 4364          | 7167-03-03000 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 2307          | 7167-00-04200 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 3505          | 7167-02-06100 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 2122          | 7167-00-00300 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 3417          | 7167-02-02800 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 1992          | 7167-00-02400 |
+-----------------------+--------------------------+---------------+---------------+
| Imperial Oaks/Meadows | Plantation Homes         | 2241          | 7167-00-03500 |
+-----------------------+--------------------------+---------------+---------------+
scdavis50
  • 111
  • 1
  • 5
  • How does the `Plan_name` column map to anything in the `sales` table? – Tim Biegeleisen Mar 04 '16 at 05:48
  • 1
    Welcome to SO! Can you post your data in a more copy and pastable format? Ideally, post the results of `dput(sales)` and `dput(plans)`, which other people can load directly, keeping all types and structure. – alistaire Mar 04 '16 at 05:48
  • ...and for numeric matches with a tolerance, check out `all.equal()`. – alistaire Mar 04 '16 at 05:49
  • @alistaire I will check out 'all.equal().' The actual files have about 2,500 records each so I didn't paste them. Here's a link: [plans](https://www.dropbox.com/s/ivxfnlbdc2mphbx/plans.csv?dl=0) and [sales](https://www.dropbox.com/s/b7aywvfpmmoodh5/sales.csv?dl=0) – scdavis50 Mar 04 '16 at 16:35
  • @tim-biegeleisen The plan name doesn't map to anything on the sales table. Ideally I'd like to match plan_label to the Parcel#. – scdavis50 Mar 04 '16 at 16:36
  • Dropbox links aren't really encouraged, either. We don't need all the data, we just need enough for a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). In this case, `dput(head(sales, 20))` would be plenty. You should also show what your desired output looks like. See the link above for lots of good advice on how to structure a question effectively. – alistaire Mar 04 '16 at 19:14
  • Thanks for your feedback. I'm going back to the drawing board after I get a little more educated on machine learning. – scdavis50 Mar 21 '16 at 00:51

0 Answers0