0

I have struggled in trying to use one data set -- the means of certain variables -- and use that in another data set to say if a certain row is greater or less than the mean if it has the same ID as the original data set. I have attempted to use use mutate in dplyr but to no avail.

For example, using the first data set as my key I want to use the second data set and make a new column that will tell me whether, for the given "code", if the money they made was greater or less than (a binary variable) the key (the first data set). Also I would like to find the difference if they are greater.

      Code  nart    money
    1     001  7180 317.8101
    2     002   151 381.1876
    3     003   147 485.6854
    4     008   632 393.6852
    5     00X   105 405.1730
    6     030     1 200.0000
    ...

      Code     ID      money
    1     001  John    317.8101
    2     030  James   381.1876
    3     003  Scott   485.6854
    4     002  Matthew 393.6852
    5     00X  Mark    405.1730
    6     00X  Josh    200.0000
    ...

Any help would be much appreciated. Thank you.

a.powell
  • 1,572
  • 4
  • 28
  • 39
  • If there are duplicate 'Code' in one dataset (2nd one), how do you want to tackle it – akrun Jun 03 '16 at 13:12
  • Merge dataframes on column Code, then take the difference on money.x and money.y columns... see above post in the comment regarding any merging issues. – zx8754 Jun 03 '16 at 13:15
  • @zx8754 Thank you. Will merge() still work if there are multiple rows with the same code? – a.powell Jun 03 '16 at 13:18
  • Short answer is yes, you should read more about [joins](https://en.wikipedia.org/wiki/Join_(SQL)) – zx8754 Jun 03 '16 at 13:20

0 Answers0