-2

In R I need to create a new dataframe (DF3) and map some data from two existing dataframes (DF1 and DF2). Some mapped fields will be net-new, some mapped fields will be existing with same name and some fields will be existing under a different name. The basic framework is this:

D1 = data.frame(
  "FieldA" = c("apple","banana","grapes","pear","orange"),
  "FieldB" = c(1,2,3,4,5),
  "FieldC" = c(5,4,2,3,1),
  "FieldD" = c(9,8,7,6,5))

D2 = data.frame(
  "FieldA" = c("bread","cereal","milk","oatmeal","smoothie"),
  "FieldB" = c(1,2,3,4,5),
  "FieldC" = c(5,4,2,3,1),
  "FieldX" = c(9,8,7,6,5),
  "FieldY" = c(3,4,5,6,7))

D3 = D1[,c("FieldA","FieldB")]

Using the above I am able to map DF1 fields to DF3. But I can't figure out how to bring over DF2 rows whilst mapping the three DF2 fields I need:

  • DF2 "FieldA" mapped to existing DF3 "FieldA"
  • DF2 "FieldX" mapped as a new column in DF3
  • DF2 "FieldY" to existing column DF3 "FieldB"

Results of DF3 should be 10 total rows of data with column fields "FieldA", "FieldB", "FieldX"

EricW
  • 69
  • 5
  • 1
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. – camille Jan 23 '20 at 18:47
  • Thanks. I included the code as is thus far. – EricW Jan 23 '20 at 19:11

1 Answers1

0

I'm not sure if this is the result you are looking for, but this line of code copies the desired fields from D1 and D2 into a new D3. The values are copied into the new dataframe by concatenating (c(...)) the desired values from the original dataframes.

D3 = data.frame("FieldA"=c(as.character(D1$FieldA), as.character(D2$FieldA)), "FieldB"=c(D1$FieldB, D2$FieldY), "FieldX"=c(rep(NA, nrow(D1)), D2$FieldX))

Output:
> D3 FieldA FieldB FieldX 1 apple 1 NA 2 banana 2 NA 3 grapes 3 NA 4 pear 4 NA 5 orange 5 NA 6 bread 3 9 7 cereal 4 8 8 milk 5 7 9 oatmeal 6 6 10 smoothie 7 5

as.character is used for FieldA because otherwise the values are treated as factors and D3 then contains a number referencing the factor level. There are many different ways one could deal with this.

I assume FieldX here is what you meant in your question. Since D1 doesn't have that field, it receives NA values in D3.

Evan
  • 1,960
  • 4
  • 26
  • 54
  • Yes you are interpreting the problem correctly. I tried your solution but it doesn't seem to work. Something is wrong with the last part of `"FieldX"=c(rep(, nrow(D1)), D2$FieldX)` . If I remove that the first parts of the line act correctly and produce the intended results. But with it in it throws an "Argument is missing" – EricW Jan 23 '20 at 19:44
  • Sorry, I was missing the `NA` value in the `rep` command. It should work now. – Evan Jan 23 '20 at 20:01
  • That did it. Yahtzee! – EricW Jan 23 '20 at 20:04