4

Can I merge data like

name,#797,"Stachy, Poland"
at_rank,#797,1
to_center,#797,4.70
predicted,#797,4.70

According to the second column and take the first column as column names?

     name             at_rank to_center predicted
#797 "Stachy, Poland" 1       4.70      4.70

Upon request, the whole set of data: http://sprunge.us/cYSJ

Reactormonk
  • 21,472
  • 14
  • 74
  • 123
  • I'm not familiar with how to accomplish this with R (although I'm interested). These types of transitions are very straightforward with [Data Wrangler](http://vis.stanford.edu/wrangler/). [Google Refine](https://code.google.com/p/google-refine/) may also work. – Lenwood Dec 05 '12 at 18:10
  • @Lenwood data wrangler only supports 1k columns - got 40k – Reactormonk Dec 05 '12 at 18:13
  • That's correct for the web version, but you can select script (Python or javaScript) as the output and run DW against any number of files or rows on your own system using their runtime tools. – Lenwood Dec 05 '12 at 18:17
  • 3
    This is called reshaping the data from long to wide format. Here's an answer that collects various ways to do this: http://stackoverflow.com/a/9617424/210673 – Aaron left Stack Overflow Dec 05 '12 at 18:21
  • @AnandaMahto not sure if it's helpful, last 4 lines: http://sprunge.us/BFeZ – Reactormonk Dec 05 '12 at 18:25

3 Answers3

2

The first problem, of reading the data in, should not be a problem if your strings with commas are quoted (which they seem to be). Using read.csv with the header=FALSE argument does the trick with the data you shared. (Of course, if the data file had headers, delete that argument.)

From there, you have several options. Here are two.

  1. reshape (base R) works fine for this:

    myDF <- read.csv("http://sprunge.us/cYSJ", header=FALSE)
    myDF2 <- reshape(myDF, direction="wide", idvar="V2", timevar="V1")
    head(myDF2)
    #    V2                V3.name V3.at_rank V3.to_center V3.predicted
    # 1  #1                Kitoman          1         2.41         2.41
    # 5  #2                Hosaena          2         4.23         9.25
    # 9  #3 Vinzelles, Puy-de-Dôme          1         5.20         5.20
    # 13 #4     Whitelee Wind Farm          6         3.29         8.07
    # 17 #5    Steveville, Alberta          1         9.59         9.59
    # 21 #6        Rocher, Ardèche          1         0.13         0.13
    
  2. The reshape2 package is also useful in these cases. It has simpler syntax and the output is also a little "cleaner" (at least in terms of variable names).

    library(reshape2)
    myDFw_2 <- dcast(myDF, V2 ~ V1)
    # Using V3 as value column: use value.var to override.
    head(myDFw_2)
    #       V2 at_rank                                       name predicted to_center
    # 1     #1       1                                    Kitoman      2.41      2.41
    # 2    #10       4                            Icaraí de Minas      6.07      8.19
    # 3   #100       2        Scranton High School (Pennsylvania)      5.78      7.63
    # 4  #1000       1                  Bat & Ball Inn, Clanfield      2.17      2.17
    # 5 #10000       3                                     Tăuteu      1.87      5.87
    # 6 #10001       1 Oak Grove, Northumberland County, Virginia      5.84      5.84
    
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • +1 beat me by 48 seconds. I was glad to get a chance to use this function, though. the arguments are confusing... – Matthew Plourde Dec 05 '12 at 18:39
  • How can I cast some of the columns? Because they're still of character type. – Reactormonk Dec 05 '12 at 22:25
  • @MatthewPlourde, `reshape` is notorious for being one of the more confusing functions to use. I've found a lot of datasets need a `time` or `id` variable to be generated before it can be used correctly. – A5C1D2H2I1M1N2O1R2T1 Dec 06 '12 at 03:29
  • @Tass, I'm sorry, I don't quite understand the follow-up question. Can you elaborate a bit on what you need? Which columns are of type "character" and what should they be instead? – A5C1D2H2I1M1N2O1R2T1 Dec 06 '12 at 03:30
1

Look at the reshape package from Hadley. If I understand correctly, you are just pivoting your data from long to wide.

Btibert3
  • 38,798
  • 44
  • 129
  • 168
0

I think in this case all you really need to do is transpose, cast to data.frame, set the colnames to the first row and then remove the first row. It might be possible to skip the last step through some combination of arguments to data.frame but I don't know what they are right now.

frankc
  • 11,290
  • 4
  • 32
  • 49