How do I apply a "patch" to a data.frame?

Question

For lack of a better word, how do I apply a "patch" to a R data.frame? Suppose I have a master database with firm and outlet columns and an ownership shares variable that is 1 or 0 in this example, but could be any percentage.

// master
     firm outlet shares.pre
1    five      1          0
2     one      1          1
3     red      1          0
4  yellow      1          0
5    five      2          0
6     one      2          0
// many more

I want to let firm "one" sell outlet "1" to firm "red", which transaction I have in another data.frame

// delta
  firm outlet shares.delta
1  one      1         -1
2  red      1          1

What is the most efficient way in R to apply this "patch" or transaction to my master database? The end result should look like this:

// preferably master, NOT a copy
     firm outlet shares.post
1    five      1          0
2     one      1          0  <--- was 1
3     red      1          1  <--- was 0
4  yellow      1          0
5    five      2          0
6     one      2          0
// many more

I am not particular about keeping the suffixes pre, post or delta. If they were all named shares that would be fine too, I simply want to "add" these data frames.

UPDATE: my current approach is this

update <- (master$firm %in% delta$firm) & (master$outlet %in% delta$outlet)
master[update,]$shares <- master[update,]$shares + delta$shares

Yes, I'm aware it does a vector scan to creat the Boolean update vector, and that the subsetting is also not very efficient. But the thing I don't like about it most is that I have to write out the matching columns.

@AriB.Friedman preferably both, but code length is more important (the database is about 100 lines long, it's a firm-level database, not a high-frequency stock market trading thing) — TemplateRex, Jan 30 '13 at 16:18

Arun · Accepted Answer · 2013-01-31T10:37:56.743

2

Another way using data.table. Assuming you've loaded both your data in df1 and df2 data.frames,

require(data.table)
dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1, firm, outlet)
setkey(dt2, firm, outlet)
dt1 <- dt2[dt1]
dt1[is.na(dt1)] <- 0
dt1[, shares.post := shares.delta + shares.pre]

#      firm outlet shares.delta shares.pre shares.post
# 1:   five      1            0          0           0
# 2:   five      2            0          0           0
# 3:    one      1           -1          1           0
# 4:    one      2            0          0           0
# 5:    red      1            1          0           1
# 6: yellow      1            0          0           0

edited Jan 31 '13 at 10:37

answered Jan 30 '13 at 16:52

Arun

116,683
26
284
387

1

+1 tnx! Is there any way I can avoid creating an `out` copy of my original data frame? I would prefer to directly update `dt1`. – TemplateRex Jan 31 '13 at 10:22
I like the `data.table` syntax, but I guess the assignment will scan over all entries of the master database? Is this more efficient than my current approach (see updated question)? – TemplateRex Jan 31 '13 at 10:40
Nope. it's by reference. Read about `?':='` from the documentation. It reads `Fast add, remove and modify subsets of columns, by reference.` – Arun Jan 31 '13 at 10:42
Yes, this will be much much faster than your updated solution. Try benchmarking with `rbenchmark`. [**Here's a post**](http://stackoverflow.com/questions/14139586/how-to-optimize-subsetting-from-a-large-dataset/14140191#14140191) where you can get the idea of using `rbenchmark`. – Arun Jan 31 '13 at 10:44

score 1 · Answer 2 · edited May 23 '17 at 11:50

1

I'd give a more precise answer if you had provided a reproducible example, but here's one way:

Call your first data.frame dat and your second chg

Then you could merge the two:

dat <- merge(dat,chg)

And just subtract:

dat$shares <- with(dat, shares.pre + shares.delta )

edited May 23 '17 at 11:50

Community

1
1

answered Jan 30 '13 at 16:42

Ari B. Friedman

71,271
35
175
235

+1 tnx! but this does not apply the changes to my orginal `dat`, how can I do that? – TemplateRex Jan 31 '13 at 10:23

How do I apply a "patch" to a data.frame?

2 Answers2