1

i am new to R and i have question regarding searching for data.frame Row.

I have a column msg no tmp sensor lat lon alt 1 8d4008b858c381ff633cca3d1b59 0 277102796 13020203 0.00000 0.000000 0.00 2 8d4008b858c37575032db3f2f30e 1 136520046 13020203 51.03620 5.892563 11574.78 3 8d40690958af7480e6c539db2d28 2 902340359 13020203 0.00000 0.000000 0.00 4 8d4008b858c37574612e52e5843d 3 185870171 13020203 51.03243 5.904694 11574.78 5 8d4008b858c375764f2c6ea82b0e 4 615986062 13020203 51.04392 5.867767 11574.78 6 8d4008b858c375749f2e15a34831 5 665795000 13020203 51.03387 5.900040 11574.78 7 8d4008b858c37207a9349cd60077 6 576273468 13020203 51.04486 5.864621 11574.78 8 8d40690958af847ff0c66f60ea8e 7 742755281 13020203 0.00000 0.000000 0.00

the data frame is huge (1.5 million value). I need to check whether there is a row with particular msg. ie ,is there a row with msg=8d4008b858c37207a9349cd60077(here row 7) . If so, return the no (here return 6) value. Also If there is no such value , it should be notified !

How can i do it efficiently for large data frame???

Thanks in advance

Warlock
  • 164
  • 3
  • 18

1 Answers1

3

Try

library(data.table)#v1.9.5+
setDT(df1)[msg%chin% '8d4008b858c37207a9349cd60077', no]
#[1] 6

Or

setDT(df1, key='msg')[.('8d4008b858c37207a9349cd60077'), no]
#[1] 6

If we are checking for a value not in the 'msg' column, it will return NA

setDT(df1, key='msg')[.('xyz'), no]
#[1] NA

and to check for NA would be to use is.na

is.na( setDT(df1, key='msg')[.('xyz'), no])
#[1] TRUE
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @Warlock Sorry, I didn't understand your comment. According to the example, you have a column named 'msg'. If we are checking against a value, say 'xyz' not in the 'msg' column, what should be the return value? – akrun Jun 13 '15 at 15:25
  • @Warlock Please check my update. Not sure if that is what you meant – akrun Jun 13 '15 at 15:28
  • why Error in setDT(sensor1, key = "msg") : unused argument (key = "msg") ? – Warlock Jun 13 '15 at 15:30
  • @Warlock Can you try with `setkey(setDT(sensor), msg)`? I am using the devel version of `data.table`. So, things might be a bit different for your version. Instructions to install the devel version are [here](https://github.com/Rdatatable/data.table/wiki/Installation) – akrun Jun 13 '15 at 15:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/80476/discussion-between-warlock-and-akrun). – Warlock Jun 13 '15 at 15:33
  • When i use setDT function in %dopar% , giving an error "could not find function "setDT"...any idea? – Warlock Jun 14 '15 at 11:19
  • @Warlock I am not sure about the problem. Is it from `library(parallel)` – akrun Jun 14 '15 at 11:22
  • Can you try with `as.data.table(df1)` instead of `setDT` – akrun Jun 14 '15 at 11:27
  • you mean change to change table to as.data.table(df1) or ?? as.data.table(df1) what it actually does? – Warlock Jun 14 '15 at 11:38
  • @Warlock It is because of your error `could not find function `setDT`. i suggested an equivalent code `as.data.table(sensor)` to convert 'data.frame' to 'data.table`. I don't have much experience with parallel. Could you post this as a new question as I am not sure how you got the error? – akrun Jun 14 '15 at 11:41
  • [link]http://stackoverflow.com/questions/20704235/function-not-found-in-r-doparallel-foreach-error-in-task-1-failed-cou i am using subset function and its working fine – Warlock Jun 14 '15 at 12:40
  • @Warlock Thanks for the update. Glad to know it works. – akrun Jun 14 '15 at 13:05