0

I'm trying to generate matrix for working with networks in R.

I followed the following steps from this stack overflow post.

How to create an adjacency matrix from raw data which is non-numeric in nature

I created followed the three methodologyes from the post, but none of them work.

My data is structured this way:

UserID | TaskID
505050 | elx-1010
505051 | elz-1211
505052 | elx-1911
505053 | elz-1414
505054 | elf-1014
505055 | fze-1415
505056 | elx-1210

I have 50.000 rows of this data. My question is:

  1. The dataset is too big to be a matrix?
  2. The string column (TaskID) need to be integer?
  3. I worked with unique and non unique values. Does this matter to the results?
  4. I have 8 GB of RAM. When I run the command to make a matrix the notebook used all the memory for several time and after a few minutes give a results.

I'm working to elaborated weighted networks. It seems it is wrong because I will have a non-square matrix.

Community
  • 1
  • 1
Angelo Canepa
  • 1,701
  • 2
  • 13
  • 21
  • 1
    I'm not very experienced with working with large datasets in R, but I suspect that you will not be able to do this in an uncomplicated fashion unless the 50,000 rows in your dataframe contain a lot of duplicates. If you have 50,000 unique values, your adjacency matrix will be 50,000 by 50,000, meaning R will need to store 2.5 billion matrix entries. I suspect the dataset is too big. – gfgm Mar 30 '16 at 23:43
  • Take a look to [this answer](http://stackoverflow.com/a/14883999/3519000), which uses `igraph`. Honestly I haven't used a `reshape` so often, but I can assure you that `igraph` can handle perfectly 50000 rows... Although it is also true that 8GB memory is not that much. Maybe give it a try. – lrnzcig Mar 31 '16 at 13:27
  • This adjacency matrix might be quite sparse, so it may benefit from sparse matrix handling by package `Matrix`. – andrechalom Apr 01 '16 at 01:52

0 Answers0