R merge with itself

Question

Can I merge data like

name,#797,"Stachy, Poland"
at_rank,#797,1
to_center,#797,4.70
predicted,#797,4.70

According to the second column and take the first column as column names?

     name             at_rank to_center predicted
#797 "Stachy, Poland" 1       4.70      4.70

Upon request, the whole set of data: http://sprunge.us/cYSJ

I'm not familiar with how to accomplish this with R (although I'm interested). These types of transitions are very straightforward with [Data Wrangler](http://vis.stanford.edu/wrangler/). [Google Refine](https://code.google.com/p/google-refine/) may also work. — Lenwood, Dec 05 '12 at 18:10
That's correct for the web version, but you can select script (Python or javaScript) as the output and run DW against any number of files or rows on your own system using their runtime tools. — Lenwood, Dec 05 '12 at 18:17
This is called reshaping the data from long to wide format. Here's an answer that collects various ways to do this: http://stackoverflow.com/a/9617424/210673 — Aaron left Stack Overflow, Dec 05 '12 at 18:21
@AnandaMahto not sure if it's helpful, last 4 lines: http://sprunge.us/BFeZ — Reactormonk, Dec 05 '12 at 18:25

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer · 2012-12-05T18:42:11.617

The first problem, of reading the data in, should not be a problem if your strings with commas are quoted (which they seem to be). Using read.csv with the header=FALSE argument does the trick with the data you shared. (Of course, if the data file had headers, delete that argument.)

From there, you have several options. Here are two.

reshape (base R) works fine for this:

myDF <- read.csv("http://sprunge.us/cYSJ", header=FALSE)
myDF2 <- reshape(myDF, direction="wide", idvar="V2", timevar="V1")
head(myDF2)
#    V2                V3.name V3.at_rank V3.to_center V3.predicted
# 1  #1                Kitoman          1         2.41         2.41
# 5  #2                Hosaena          2         4.23         9.25
# 9  #3 Vinzelles, Puy-de-Dôme          1         5.20         5.20
# 13 #4     Whitelee Wind Farm          6         3.29         8.07
# 17 #5    Steveville, Alberta          1         9.59         9.59
# 21 #6        Rocher, Ardèche          1         0.13         0.13

The reshape2 package is also useful in these cases. It has simpler syntax and the output is also a little "cleaner" (at least in terms of variable names).

library(reshape2)
myDFw_2 <- dcast(myDF, V2 ~ V1)
# Using V3 as value column: use value.var to override.
head(myDFw_2)
#       V2 at_rank                                       name predicted to_center
# 1     #1       1                                    Kitoman      2.41      2.41
# 2    #10       4                            Icaraí de Minas      6.07      8.19
# 3   #100       2        Scranton High School (Pennsylvania)      5.78      7.63
# 4  #1000       1                  Bat & Ball Inn, Clanfield      2.17      2.17
# 5 #10000       3                                     Tăuteu      1.87      5.87
# 6 #10001       1 Oak Grove, Northumberland County, Virginia      5.84      5.84

+1 beat me by 48 seconds. I was glad to get a chance to use this function, though. the arguments are confusing... — Matthew Plourde, Dec 05 '12 at 18:39
How can I cast some of the columns? Because they're still of character type. — Reactormonk, Dec 05 '12 at 22:25
@MatthewPlourde, `reshape` is notorious for being one of the more confusing functions to use. I've found a lot of datasets need a `time` or `id` variable to be generated before it can be used correctly. — A5C1D2H2I1M1N2O1R2T1, Dec 06 '12 at 03:29
@Tass, I'm sorry, I don't quite understand the follow-up question. Can you elaborate a bit on what you need? Which columns are of type "character" and what should they be instead? — A5C1D2H2I1M1N2O1R2T1, Dec 06 '12 at 03:30

score 1 · Answer 2 · answered Dec 05 '12 at 18:20

1

Look at the reshape package from Hadley. If I understand correctly, you are just pivoting your data from long to wide.

answered Dec 05 '12 at 18:20

Btibert3

38,798
44
129
168

1

I would be curious how this performs with 40k columns. – joran Dec 05 '12 at 18:21
I'm curious how to read it into R because it gets confused because I have strings and numbers in a single column... – Reactormonk Dec 05 '12 at 18:22
@Tass, so is your question about data manipulation or about reading the data into R? – A5C1D2H2I1M1N2O1R2T1 Dec 05 '12 at 18:31

score 0 · Answer 3 · answered Dec 05 '12 at 18:27

I think in this case all you really need to do is transpose, cast to data.frame, set the colnames to the first row and then remove the first row. It might be possible to skip the last step through some combination of arguments to data.frame but I don't know what they are right now.

R merge with itself

3 Answers3