0

I have a dataframe of about 81,000 rows. They all contain a vector with the following data

0193,02394,2093,Alabama,Alabama,23094,23193,24311,24411

I'm trying to get a table with all the 81,000 rows separated into three columns containing the names and the last number. each row will look like this:

Alabama | Alabama | 24411

So far, my code looks like this:

pop.dat <- data.frame()
for (i in 1:nrow(pop.data)){
     pop.dat <- rbind(pop.dat, t(data.frame(data.frame(strsplit(as.character(pop.data[i,]), ','))[c(7:8, 13),])))
}

It works well, but it is way too slow! Can anyone help me speed it up? Maybe use an apply function or something.

joran
  • 169,992
  • 32
  • 429
  • 468
Landmaster
  • 1,043
  • 2
  • 13
  • 21
  • 1
    Are you reading these data in from a file? It might be easier to read the file with `read.csv`, and then just select the columns you want. – nograpes Aug 08 '14 at 16:07
  • I am reading it from a csv, but in the csv, the data is all in one column, with 81,000 rows. – Landmaster Aug 08 '14 at 16:17
  • So, you're saying there is a CSV within a CSV? I guess that would be possible with some quoting... perhaps you can post the first few lines of the original CSV (including all the columns). – nograpes Aug 08 '14 at 16:18
  • I wonder now if this is effectively a duplicate of [this question](http://stackoverflow.com/questions/4350440/using-strsplit-with-data-frames-to-split-label-columns-into-multiple?rq=1). – nograpes Aug 08 '14 at 16:21

1 Answers1

1

You can use strsplit on the entire column at once, and then bind the rows, and select your desired columns, like this:

# Create some data
pop.data <- data.frame(col=rep('0193,02394,2093,Alabama,Alabama,23094,23193,24311,24411',3), stringsAsFactors=FALSE)
# Split by comma, then rbind the list.
do.call(rbind, strsplit(pop.data$col,',')) [,c(4,5,9)]

But, if you are reading these from a file, use read.csv, it will be fast and easy.

nograpes
  • 18,623
  • 1
  • 44
  • 67
  • I opened the csv in a text editor, so as .txt Then I found that all the vectors contain quotation marks at the beginning and end. I replaced those quotation marks with nothing, opened the csv again and Excel did it for me :D – Landmaster Aug 08 '14 at 16:32
  • Great, I am glad you were successful! If you liked the answer, you can click the check box to the left of it. – nograpes Aug 08 '14 at 16:39