0

I have a 5-level factor that looks like the following:

tmp

[1] NA                                                                   
[2] 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46                       
[3] NA                                                                   
[4] NA                                                                   
[5] 5,9,16,24,35,36,42                                                   
[6] 4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
[7] 8,39                                                                 
5 Levels: 1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46 ...

I want to access the items within each level except NA. So I use the levels() function, which gives me:

> levels(tmp)
[1] "1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46"                       
[2] "4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50"
[3] "5,9,16,24,35,36,42"                                                   
[4] "8,39"                                                                 
[5] "NA"      

Then I would like to access the elements in each level, and store them as numbers. However, for example,

>as.numeric(cat(levels(tmp)[3]))
5,9,16,24,35,36,42numeric(0)

Can you help me removing the commas within the numbers and the numeric(0) at the very end. I would like to have a vector of numerics 5, 9, 16, 24, 35, 36, 42 so that I can use them as indices to access a data frame. Thanks!

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
user2498497
  • 693
  • 2
  • 14
  • 22
  • 1
    This is a mess. How did your data get organized this way in the first place?? – Señor O Jun 24 '14 at 15:53
  • This data comes from a tree object in R. Each level corresponds to what elements are contained in the node of a tree. Basically, tmp comes from `tmp = mytree$frame$yval'. It is a factor in R. Then I want to convert the elements in each level into numeric vectors. So I can find the corresponding names using the key-value pairs. – user2498497 Jun 24 '14 at 15:59
  • 2
    I don't think that the `mytree$frame$yval` was *originally* a factor. Why don't you use `dput` to put in the original tree object, and tell us what you want to accomplish? – AndrewMacDonald Jun 24 '14 at 16:01
  • I just checked the class of mytree$frame$yval. It is indeed a factor. It contains the information that which data points are included in the leaves of the tree. What I want to accomplish is to extract this information. It turns out levels(mytree$frame$yval) tells me what the data points are in each leaf. However the points are stored as characters. Hence, I need to convert them into numerics. Hence, I am asking the above question. Thanks.:) – user2498497 Jun 24 '14 at 16:30

3 Answers3

3

You need to use a combination of unlist, strsplit and unique.

First, recreate your data:

dat <- read.table(text="
NA                                                                   
1,2,3,6,11,12,13,18,20,21,22,26,29,33,40,43,46                       
NA                                                                   
NA                                                                   
5,9,16,24,35,36,42                                                   
4,7,10,14,15,17,19,23,25,27,28,30,31,32,34,37,38,41,44,45,47,48,49,50
8,39")$V1

Next, find all the unique levels, after using strsplit:

sort(unique(unlist(
  sapply(levels(dat), function(x)unlist(strsplit(x, split=",")))
  )))

 [1] "1"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "2"  "20" "21" "22" "23" "24" "25" "26"
[20] "27" "28" "29" "3"  "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "4"  "40" "41" "42" "43"
[39] "44" "45" "46" "47" "48" "49" "5"  "50" "6"  "7"  "8"  "9" 
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thank you very much, but this is not what I wanted. First, I prefer not to recreate the data. Second, I don't want to merge all the indices. I still need to preserve the grouping of the indices. Ideally, if the function could output the following, that would be great: [1] 1 2 3 6 11 12 13 18 20 21 22 26 29 33 40 43 46 [2] 4 7 10 14 15 17 19 23 25 27 28 30 31 32 34 37 38 41 44 45 47 48 49 50 [3] 5 9 16 24 35 36 42 [4] 8 39 Thanks. – user2498497 Jun 24 '14 at 16:11
  • 1
    Andrie recreated your data because you did not provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You shouldn't need to do this step in your own code. You might consider editing your question to include what you want your output to look like, since it is clearer in your comment here than in the original question. – Kara Woo Jun 24 '14 at 16:57
2

Does this do what you want?

levels_split <- strsplit(levels(tmp), ",")
lapply(levels_split, as.numeric)
Kara Woo
  • 3,595
  • 19
  • 31
0

Using Andrie's dat

 val <- scan(text=levels(dat),sep=",")
 #Read 50 items

 split(val,cumsum(c(T,diff(val) <0)))
 #$`1`
 #[1]  1  2  3  6 11 12 13 18 20 21 22 26 29 33 40 43 46

 #$`2`
 #[1]  4  7 10 14 15 17 19 23 25 27 28 30 31 32 34 37 38 41 44 45 47 48 49 50

 #$`3`
 #[1]  5  9 16 24 35 36 42

 #$`4`
 #[1]  8 39
akrun
  • 874,273
  • 37
  • 540
  • 662