I have a large CSV file with three columns of Reddit data, a subreddit name, a second subreddit name, and the number of unique commenters who have posted to both subreddits within the past month.
The CSV file contains the subreddit relationships going both ways, for instance, the following two lines exist in the CSV:
Roadcam,Nootropics,39
Nootropics,Roadcam,39
In total there are 35778434 lines in the CSV file.
I'm looking to import the CSV file into R and store it as a sparse matrix for analysis. This is how I am attempting to do this:
subreddit.overlaps <- read.csv("subreddit_overlaps_2017_01.csv")
subreddit.overlaps.matrix <- sparseMatrix(i = as.numeric(subreddit.overlaps[, 1]),
j = as.numeric(subreddit.overlaps[, 2]),
x = subreddit.overlaps[, 3])
However, the issue I'm having is that the dimensions of the produced sparse matrix are not what I would expect. The created sparse matrix appears to only have 4561 rows and 68825 columns. I would have expected the dimensions to be a perfect square, but that doesn't appear to be the case. Why would teh created sparse matrix not be a perfect square?