converting multiple lines of text into a data frame

Question

I'm trying to find a way to convert multiple lines of text into a data frame. I'm not sure if there's a way where you can use read.delim() to read in multiple lines of text and create the following data frame with something akin to rehape()?.

The data is structured as follows:

A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35

I'd like to convert this data to something that looks like the following data frame:

A             B             C
1             2             10
34            20            6.7
2             78            35

Apologies if there is an obvious way to do this!

score 12 · Accepted Answer · answered Mar 06 '10 at 15:22

12

How about :

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
cols<-levels(d[,'V1'])
d<-data.frame(sapply(cols,function(x) {d['V2'][d['V1']==x]}, USE.NAMES=TRUE))

which yields:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

answered Mar 06 '10 at 15:22

unutbu

842,883
184
1,785
1,677

That was a clever use of `sapply()`. I hadn't thought of using it that way before. – andrewj Mar 08 '10 at 16:00
Thanks. I'm just starting to learn R, so I had to try using the few tools at my disposal. :) I just noticed your solution using `unstack`. That looks like the best way to me. – unutbu Mar 08 '10 at 17:57
Found this helpful because i needed to convert a text-string `"a;lorem\nb;ipsum\nc;gecko"` into a data.frame and it worked with the `textConnection()`-function I didn't know. – schlusie Jul 31 '15 at 14:40

Leo Alekseyev · Answer 2 · 2010-03-06T06:01:38.487

4

Here is how to do it with the plyr package:

require("plyr")
my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)

as.data.frame(dlply(df,.(V1),function(x) x[[2]]))

You get

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

You can see what magic plyr is doing just by playing with dlply(df,.(V1)) or dlply(df,.(V1),function(x) x)

edited Mar 06 '10 at 06:01

answered Mar 06 '10 at 05:53

Leo Alekseyev

12,893
5
44
44

Thanks for the `plyr` suggestion. Definitely worth exploring further. I found an alternative to solving my question using `unstack` – andrewj Mar 06 '10 at 21:49
Ah, good call; in this case that's probably the way to go. plyr can be rather handy, though, for other "group by" type operations. If you'd like to explore further you might want to read http://had.co.nz/plyr/plyr-intro-090510.pdf – Leo Alekseyev Mar 06 '10 at 23:33

score 2 · Answer 3 · answered Mar 06 '10 at 21:46

I posted this question on R-help as well, and got a response from Phil Spector suggesting unstack.

This is a modification of Leo Alekseyev's response

my.data <- "A: 1
            B: 2
            C: 10
            A: 34
            B: 20
            C: 6.7
            A: 2
            B: 78
            C: 35"   
df <- read.delim(textConnection(my.data),header=FALSE,sep=":",strip.white=TRUE)
unstack(df, V2 ~ V1)

This results in:

   A  B    C
1  1  2 10.0
2 34 20  6.7
3  2 78 35.0

Some advantages of this approach compared to the other thoughtful answers is that you don't need to specify the number of columns ahead of time. It also doesn't require any additional packages.

score 0 · Answer 4 · answered Mar 06 '10 at 05:29

Here is one solution using reshape

s<-"A: 1
B: 2
C: 10
A: 34
B: 20
C: 6.7
A: 2
B: 78
C: 35
"
d<-d<-read.delim(textConnection(s),header=FALSE,sep=":",strip.white=TRUE)
N<-nrow(d)%/%3
d$id<-rep(1:N,each=3)
reshape(d,dir="wide",timevar="V1",idvar="id")

Which produces

  id V2.A V2.B V2.C
1  1    1    2 10.0
4  2   34   20  6.7
7  3    2   78 35.0

converting multiple lines of text into a data frame

4 Answers4

Linked