6

I'm trying to find a way to convert multiple lines of text into a data frame. I'm not sure if there's a way where you can use read.delim() to read in multiple lines of text and create the following data frame with something akin to rehape()?.

The data is structured as follows:

A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35

I'd like to convert this data to something that looks like the following data frame:

A             B             C
1             2             10
34            20            6.7
2             78            35

Apologies if there is an obvious way to do this!

andrewj
  • 2,965
  • 8
  • 36
  • 37

4 Answers4

12

How about :

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
cols<-levels(d[,'V1'])
d<-data.frame(sapply(cols,function(x) {d['V2'][d['V1']==x]}, USE.NAMES=TRUE))

which yields:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • That was a clever use of `sapply()`. I hadn't thought of using it that way before. – andrewj Mar 08 '10 at 16:00
  • Thanks. I'm just starting to learn R, so I had to try using the few tools at my disposal. :) I just noticed your solution using `unstack`. That looks like the best way to me. – unutbu Mar 08 '10 at 17:57
  • Found this helpful because i needed to convert a text-string `"a;lorem\nb;ipsum\nc;gecko"` into a data.frame and it worked with the `textConnection()`-function I didn't know. – schlusie Jul 31 '15 at 14:40
4

Here is how to do it with the plyr package:

require("plyr")
my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)

as.data.frame(dlply(df,.(V1),function(x) x[[2]]))

You get

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

You can see what magic plyr is doing just by playing with dlply(df,.(V1)) or dlply(df,.(V1),function(x) x)

Leo Alekseyev
  • 12,893
  • 5
  • 44
  • 44
  • Thanks for the `plyr` suggestion. Definitely worth exploring further. I found an alternative to solving my question using `unstack` – andrewj Mar 06 '10 at 21:49
  • Ah, good call; in this case that's probably the way to go. plyr can be rather handy, though, for other "group by" type operations. If you'd like to explore further you might want to read http://had.co.nz/plyr/plyr-intro-090510.pdf – Leo Alekseyev Mar 06 '10 at 23:33
2

I posted this question on R-help as well, and got a response from Phil Spector suggesting unstack.

This is a modification of Leo Alekseyev's response

my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)
unstack(df, V2 ~ V1)

This results in:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

Some advantages of this approach compared to the other thoughtful answers is that you don't need to specify the number of columns ahead of time. It also doesn't require any additional packages.

andrewj
  • 2,965
  • 8
  • 36
  • 37
0

Here is one solution using reshape

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
N<-nrow(d)%/%3
d$id<-rep(1:N,each=3)
reshape(d,dir="wide",timevar="V1",idvar="id")

Which produces

  id V2.A V2.B V2.C
1  1    1    2 10.0
4  2   34   20  6.7
7  3    2   78 35.0
Jyotirmoy Bhattacharya
  • 9,317
  • 3
  • 29
  • 38