0

I have the data as follows:

    V1   V2
1 10001 1003
2 10002 1005
3 10002 1007
4 10003 1001
5 10003 1005
...

These are edge list data.

The index of V1 is really sparse, only a few of numbers in [1..10001] are occupied.

For example, it is something like max(V1) = 20000 but range(V1) = [10000, 20000].

I want to compress the index.

Here's what I've done:

sorted <- sort(data, index.return = T)

However for duplicated node index, different sorted index is returned. Also, I need the inverse index of the returned index (or, sorted$ix).

I'm new to R and how shall I do it?

SolessChong
  • 3,370
  • 8
  • 40
  • 67

2 Answers2

0

Maybe you could save some memory through casting the type of index into 'factor'.

For example:

> d <- data.frame(x = rep(c(1000, 2000), 10000), y=rep(c(100, 150), 10000)) 
> object.size(d)
320448 bytes
> d1 <- data.frame(x=as.factor(d$x), y=as.factor(d$y))
> object.size(d1)
160992 bytes
Thomas
  • 43,637
  • 12
  • 109
  • 140
Gao Hao
  • 254
  • 2
  • 13
  • 1
    It seems like you need a compact index from your solution. Due to my small reputation, I add my comment here. Perhaps just need to append this line "levels(d1$x) <- 1:length(levels(d$x))" to the codes I posted before. That will give you a compact index. – Gao Hao Jul 26 '13 at 02:17
0

I'm new to R and the code may be ugly. Please modify it if you find anything ugly.

The main idea is to perform unique and perform a look-up-table.

# index compression
V1_uniq = unique(data[,1])
V3_uniq = unique(data[,3])

user_n = length(V1_uniq)
ast_n = length(V3_uniq)

rst = sort(V1_uniq, index.return = T)
LUT1 = c(0)
for ( i in 1 : length(rst$x) )
    LUT1[V1_uniq[i]] = rst$ix[i]

usr_comp = LUT1[data[,1]]

rst = sort(V3_uniq, index.return = T)
LUT3 = c(0)
for ( i in 1 : length(rst$x) )
    LUT3[V3_uniq[i]] = rst$ix[i]

ast_comp = LUT3[data[,3]]
SolessChong
  • 3,370
  • 8
  • 40
  • 67