1

I would like to read a text file (tab delimited). The problem is that each set of measures are organized "block-wise".

For example, with this input (bodydata.txt):

Body fat
08/21/2013  1:46 PM 17.4
08/20/2013  11:20 AM    17.4
08/17/2013  10:49 AM    17.2
08/16/2013  1:33 PM 17.4
08/15/2013  12:07 PM    17.5
08/14/2013  11:18 AM    17.4
08/13/2013  12:17 PM    17.3

Body weight
08/21/2013  1:46 PM 157
08/20/2013  11:20 AM    156.4
08/17/2013  10:49 AM    155
08/16/2013  1:33 PM 157
08/15/2013  12:07 PM    157
08/14/2013  11:17 AM    157
08/13/2013  12:16 PM    157.4
08/11/2013  4:47 PM 158.2

I would like to import them and have separate data frame for each measure like this:

> weight
          V1       V2   V3
1 08/21/2013  1:46 PM 17.4
2 08/20/2013 11:20 AM 17.4
3 08/17/2013 10:49 AM 17.2
4 08/16/2013  1:33 PM 17.4
5 08/15/2013 12:07 PM 17.5
6 08/14/2013 11:18 AM 17.4
7 08/13/2013 12:17 PM 17.3

In Unix environment, it's not hard to split the text file with sed (like this),but that solution is not portable. It would be nice if I could find a solution in an R native way. Any suggestions, please?

P.S. I couldn't come up with good keywords for online search. I would appreciate any articles/threads or Google search. I'm sorry if there was a duplicate article I was not aware.

Community
  • 1
  • 1
KenM
  • 2,756
  • 1
  • 13
  • 13

1 Answers1

1

Your file doesn't have tabs so I'm going to use whitespace as the separator. I'm using a blenk line as a file separator:

Lines <- readLines(textConnection("Body fat
08/21/2013  1:46 PM 17.4
08/20/2013  11:20 AM    17.4
08/17/2013  10:49 AM    17.2
08/16/2013  1:33 PM 17.4
08/15/2013  12:07 PM    17.5
08/14/2013  11:18 AM    17.4
08/13/2013  12:17 PM    17.3

Body weight
08/21/2013  1:46 PM 157
08/20/2013  11:20 AM    156.4
08/17/2013  10:49 AM    155
08/16/2013  1:33 PM 157
08/15/2013  12:07 PM    157
08/14/2013  11:17 AM    157
08/13/2013  12:16 PM    157.4
08/11/2013  4:47 PM 158.2")
)

sdat <- split(Lines, cumsum(nchar(Lines)==0))
lapply(sdat , function(lins) {
                 good<- lins[nchar(lins)>0]
                 assign(make.names(good[1]),   #name
                        read.table(text=good[-1]) , envir=.GlobalEnv) })

You will see the function print the split table but as a side-effect there will be two objects in your global environment named Body.fat and Body.weight. You probably want to put sep="\t" in the read.table call if there really are tabs in the original file.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Great. I would never think of using `cumsum` to make a "factor" for `split`. Thank you for the solution. Tabs are lost when I copy'n'pasted the raw file. The original file indeed has tabs, and I used sep="\t" to confirm your solution. Thanks for giving heads-up! – KenM Feb 25 '14 at 03:53
  • I attribute such answers to years of watching Jim Holtman perform file trickery and slight-of-hand on Rhelp – IRTFM Feb 25 '14 at 04:23
  • Cool. Now I enjoy reading his posts on Rhelp. It's always fun to have some creativity. – KenM Feb 25 '14 at 17:48