1

I have a .csv file containing 22.388 rows with comma seperated numbers. I want to find all possible combinations of pairs of the numbers for each row seperately and list them pair for pair, so that I'll be able to make a visual representation of them as clusters.

An example of two rows from my file would be
"2, 13"
"2, 8, 6"

When I use the str() function R says the file contains factors. I guess it needs to be integers, but I need the rows to be seperate, therefore I've wrapped each row in " ".

I want possible combinations of pairs for each row like this.

2, 13
2, 8
2, 6
8, 6

I've already gotten an answer from @flodel saying

Sample input - replace textConnection(...) with your csv filename.

csv <- textConnection("2,13
2,8,6")

This reads the input into a list of values:

input.lines  <- readLines(csv)
input.values <- strsplit(input.lines, ',')

This creates a nested list of pairs:

pairs <- lapply(input.values, combn, 2, simplify = FALSE)
This puts everything in a nice matrix of integers:

pairs.mat <- matrix(as.integer(unlist(pairs)), ncol = 2, byrow = TRUE)
pairs.mat

But I need the function to run through each row in my .csv file seperately, so I think I need to do a for loop with the function - I just can't get my head around it.

Thanks in advance.

  • 4
    You should supply a small reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik May 09 '13 at 13:45
  • In a way they do provide an example of what the data looks like when they say `An example of two rows from my file would be` but it's not runable in its current state. – Tyler Rinker May 09 '13 at 14:26

1 Answers1

0

Not sure exactly what you're after but maybe something like this:

dat <- readLines(n=2) #read in your data
2, 13 
2, 8, 6

## split each string on "," and then remove white space
## and put into a list with lapply

dat2 <- lapply(dat, function(x) {   
    as.numeric(gsub("\\s+", "", unlist(strsplit(x, ","))))
})


## find all combinations using outer with outer (faster 
## than expand.grid and we can take just a triangle)

dat3 <- lapply(dat2, function(x) {
    y <- outer(x, x, paste)
    c(y[upper.tri(y)])
})

## then split on the spaces and convert back to numeric
## stored as a list

lapply(strsplit(unlist(dat3), " "), as.numeric)

## > lapply(strsplit(unlist(dat3), " "), as.numeric)
## [[1]]
## [1]  2 13
## 
## [[2]]
## [1] 2 8
## 
## [[3]]
## [1] 2 6
## 
## [[4]]
## [1] 8 6
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • Thanks so much! This seems to work, only problem now, is that I need to get it into Gephi for the visualization, but my R takes forever and hasn't made my table yet. – Matias Bruhn May 09 '13 at 15:42