-2

I have a list of over 1,000,000 numbers. I have a lookup table that has a range of numbers and a category. For example, 0-200 is category A, 201-650 is category B (the ranges are not of equal length)

I need to simply iterate over the list of 1,000,000 numbers and get a list of the 1,000,000 corresponding categories.

EDIT:

For example, the first few elements of my list are - 100, 125.5, 807.5, 345.2, and it should return something like 1,1,8,4 as categories. The logic for the mapping is implemented in a function - categoryLookup(cd) and I'm using the following command to get the categories

cats <- sapply(list.cd, categoryLookup)

However, while this seems to be working quickly on lists of size up to 10000, it is taking a lot of time for the whole list.

What is the fastest way to do the same? Is there any form of indexing that can help speed up the process?

wrahool
  • 1,101
  • 4
  • 18
  • 42
  • 2
    Perhaps have a look at `?cut` and its arguments `breaks` and `labels`? – Henrik Oct 28 '14 at 07:36
  • 2
    To get more specific answers, please be more specific in your question, i.e. post a minimal reproducible example: include a _minimal_ version of your "list" and "lookup table", the desired result, and show the code you have tried. – Henrik Oct 28 '14 at 07:48
  • Are your numbers **integers** only? – Spacedman Oct 28 '14 at 08:33
  • Oh, and are your boundaries integer only? In your example, you give non-overlapping boundaries (ie what happens to 200.5? A, or B?) – Spacedman Oct 28 '14 at 08:44
  • Please provide a **minimal, self contained example**. Check these links for general ideas, and how to do it in R: [**here**](http://stackoverflow.com/help/mcve), [**here**](http://www.sscce.org/), [**here**](http://adv-r.had.co.nz/Reproducibility.html), and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – Henrik Oct 28 '14 at 08:50

1 Answers1

1

The numbers:

numbers <- sample(1:1000000)

groups:

groups <- sort(rep(letters, 40000))

lookup:

categories <- groups[numbers]

EDIT:

If you don't yet have the vector of "groups" you can create it first.

Assume you have data-frame with range info:

ranges <- data.frame(group=c("A","B","C"),
                     start=c(0,300001,600001),
                     end=c(300000,600000,1000000)
                    )

ranges
  group  start   end
1     A      1 3e+05
2     B 300001 6e+05
3     C 600001 1e+06

# if groups are sorted and don't overlap:
groups <- rep(ranges$group, (ranges$end-ranges$start)+1)

Then continue as before

categories <- groups[numbers]

EDIT: as @jbaums said - you will have to add +1 to the (ranges$end-ranges$start) in this case. (already edited in the example above). Also in this case your starting coordinate should be 1 and not a 0

Karolis Koncevičius
  • 9,417
  • 9
  • 56
  • 89
  • However, your solution assumes that the`numbers` are integers, whereas it's possible that `numbers` is an unsorted vector of numbers that the OP needs to classify into the groups based on the ranges in which they fall. I guess it's up to the OP to provide more detail about what they have and what they expect. – jbaums Oct 28 '14 at 08:12
  • 1
    :) You are right again. In that case something like `group[ceiling(numbers)]` should work.. If the starting/ending points themselves can be non-integers then I will have to think about another solution. But I will wait for some kind of response from OP first. – Karolis Koncevičius Oct 28 '14 at 08:17