I get the main idea behind this error, my data is too big 512218 records with 3 variables and I'm trying to convert the dataframe to tabular format so I can get adjacency matrix. Right now I'm using xtabs and getting this error
n <- xtabs(USER_LINK ~ screenName + screen_name_mention, df)
I tried using sapply(df,table)
(as mentioned in a related question) but it didn't work. What I want to know is there an alternative way to convert dataframe to tabular format without getting this error?
head
of data
screenName screen_name_mention USER_LINK
1 g_fandos ecolandlab 1
2 andrewmbrass PLOSBiology 1
3 andrewmbrass PLOSBiology 1
4 welloldstem dbcurren 1
5 PaulJDavison BehavEcol 1
6 cbjones1943 BiolJLinnSoc¿ 1
str(df)
'data.frame': 512218 obs. of 3 variables:
$ screenName : Factor w/ 150233 levels "","#$%","#cuttingeeg",..: 50920 8866 8866 145600 106833 23847 23847 98575 98575 61282 ...
$ screen_name_mention: Factor w/ 150233 levels "","#$%","#cuttingeeg",..: 41276 110025 110025 33531 15579 17454 61209 112371 38473 110091 ...
$ USER_LINK : int 1 1 1 1 1 1 1 1 1 1 ...
Example:
User_name M_User Total
user 1 user 2 7
user 1 user 3 19
user 1 user 7 5
user 3 user 2 1
user 2 user 7 1
End Results
User_name user 1 user 2 user 3 user 7
user 1 0 7 19 5
user 2 0 0 0 1
user 3 0 1 0 0
user 7 0 0 0 0
My code works fine for small dataset like this (even creates 5000x5000 matrix) but not for large dataset