2

I am trying to fill a new column in a df of 700k records and it goes too slow with for loop and therefore want to use apply function. Not familiar with it and below is my attempt but this doesn't work. Please help

myfunc <- function(a,b,c,d) {if (a=="xyz" & b==11) {c=d}}
dataf[,'target'] <- apply(dataf, 1, function(dataf) myfunc(dataf[,'col1'],dataf[,'col2'],dataf[,'target'],dataf[,'col3']))

Adding more description -

What I have:

a   b   c   d
x   2       p
x   2       p
x   2       p
xyz 11      p
xyz 11      p
xyz 2       p
y   2       p
y   2       p
y   2       p

What I want to achieve:

a   b   c   d
x   2       p
x   2       p
x   2       p
xyz 11  p   p
xyz 11  p   p
xyz 2       p
y   2       p
y   2       p
y   2       p
dsauce
  • 592
  • 2
  • 14
  • 36
  • please tell us exactly what you want by providing a reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example just a little simplified toy-example which resembles your big large problem; it seems like what you are doing doesnt need a function at all and can be done by just simple subsetting, something like: `library(data.table) setDT(dataf)[col1 == "xyz & col2 == 11, target := col3]` which would make everything A LOT faster; see this intro to data.table: http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf – grrgrrbla Jun 19 '15 at 09:09
  • i added example. Also, i used your suggestion with setDT. It worked superfast. However, i did target:=paste("x",col3,sep="") and it updated both, target and col3, what should be the right way? – dsauce Jun 19 '15 at 09:23

1 Answers1

2

given your OP, I am guessing you want this??

library(data.table)
setDT(dataf)[a == "xyz" & b == 11, c := d]

output:

     a  b d  c
1:   x  2 p NA
2:   x  2 p NA
3:   x  2 p NA
4: xyz 11 p  p
5: xyz 11 p  p
6: xyz  2 p NA
7:   y  2 p NA
8:   y  2 p NA
9:   y  2 p NA

I highly suggest reading the tutorial of data.table which is super-fast and can be used for a lot of different things. On this site you find even more articles. I would read them all, you will need all of this and it will help you a lot!!

grrgrrbla
  • 2,529
  • 2
  • 16
  • 29
  • thanks! I tried this in this example, and an other one where it doesn't seem to work exactly as intended. i did c:=paste("x",d,sep="") and it updated both, c and d with new values, what should be the right way? Also, i am impressed by the speed of data.table, i will learn it later today – dsauce Jun 19 '15 at 09:54
  • hard to tell without seeing some sample data, I guess you can answer your own question after reading more about data.table, what are you trying to do there? if you want to set c equal to "x" than just type `c := "x"` ; also if this answers your question accept the answer by clicking the arrow and upvote it, – grrgrrbla Jun 19 '15 at 09:58
  • this does answer my question in terms of what i asked. however, i was wondering why the same doesn't happen when i slightly tweak it. instead of c:=d, i did c:=paste("x",d,sep=""), i.e. i am trying to do c=d and also concatenate and "x" when the conditions are true. when i do this, it also updates the column d – dsauce Jun 19 '15 at 10:28
  • hmm, this would be a good new question :), I have no idea to be honest – grrgrrbla Jun 19 '15 at 10:38