2

I am looking to read a .tps file into R.

An example file is now available at:

example file

The actual files I am trying to read into R obviously have many more individuals/IDs (>1000)

The .tps file format is produced by TPSDIG.

http://life.bio.sunysb.edu/morph/

The file is an ANSI plain text file.

The file contains X and Y coordinates and specimen information as follows.

The main difficulty is that specimens vary in the numbers of attributes (eg. some have 4 and some have 6 LM landmarks, some have 2 curves, others none, with thus no associated points).

I have tried working with a for loop and read.table, but can not find a way to account for the varying number of attributes.

Example of start of file

LM=3
1  1
2  2
3  3
CURVES=2
POINTS=2
1 1
2 2
POINTS=2
1 1
2 2
IMAGE=COMPLETE/FILE/PATH/IMAGE
ID=1
SCALE=1
LM=3
1  1
2  2
3  3
CURVES=2
...

Example dummy code that works if all specimens have equal number of attributes.

i<-1
landmarks<-NULL
while(i < 4321){

  print(i)

  landmarks.temp<-read.table(file="filepath", sep=" ", header=F, skip=i, nrows=12, col.names=c("X", "Y"))
  i<-i+13
  landmarks.temp$ID<-read.table(file="filepath", sep=c(" "), header=F, skip=i, nrows=1, as.is=T)[1,1]
  i<-i+1
  landmarks.temp$scale<-read.table(file="filepath", sep=c(" "), header=F, skip=i, nrows=1, as.is=T)[1,1]
  i<-i+2

  landmarks<-rbind(landmarks, landmarks.temp)

  print(unique(landmarks.temp$ID))
}
Etienne Low-Décarie
  • 13,063
  • 17
  • 65
  • 87
  • I think you're going to want to use `scan` and/or `readLines` for finer control ... – Ben Bolker Mar 15 '12 at 23:09
  • Thank you Prof. Bolker, however read.table seems to provide as much flexibility as 'scan' (for which it is a wrapper) or 'readLines'. I am starting to think I will need to read line by line (with either 'read.table', 'readLines' or 'scan') and have conditions for each possible value of that line and the previous. I am hopping someone may have went through this leg work. – Etienne Low-Décarie Mar 16 '12 at 12:42
  • If you provide more complete example data, someone will surely provide a readLines/regex based solution. – jbaums Mar 16 '12 at 13:14
  • I have attached an example file. – Etienne Low-Décarie Mar 16 '12 at 15:12
  • So data between two `ID`s belongs to one individual? – Roman Luštrik Mar 16 '12 at 15:28
  • You say that CURVES can sometimes be 0. In that case, would there be any POINTS attributes at all (would POINTS=0 or would POINTS be missing)? – jbaums Mar 16 '12 at 23:37
  • After mrdwab's answer, I beg Prof. Bolker's pardon, readLines was the key. – Etienne Low-Décarie Mar 17 '12 at 03:22

2 Answers2

3

I'm not exactly clear about what you are looking for in your output. I assumed a standard data frame with X, Y, ID, and Scale as the variables.

Try this function that I threw together and see if it gives you the type of output that you're looking for:

    read.tps = function(data) {
      a = readLines(data)
      LM = grep("LM", a)
      ID.ind = grep("ID", a)  
      images = basename(gsub("(IMAGE=)(.*)", "\\2", a[ID.ind - 1]))

      skip = LM
      nrows = as.numeric(gsub("(LM=)([0-9])", "\\2", grep("LM", a, value=T)))
      l = length(LM)

      landmarks = vector("list", l)

      for (i in 1:l) {
        landmarks[i] = list(data.frame(
            read.table(file=data, header=F, skip=LM[i],
                       nrows=nrows[i], col.names=c("X", "Y")),
            IMAGE = images[i],
            ID = read.table(file=data, header=F, skip=ID.ind[i]-1, 
                            nrows=1, sep="=", col.names="ID")[2,],
            Scale = read.table(file=data, header=F, skip=ID.ind[i],
                                nrows=1, sep="=")[,2]))
      }
      do.call(rbind, landmarks)
    }

After you've loaded the function, you can use it by typing:

read.tps("example.tps")

where "example.tps" is the name of your .tps file in your working directory.

If you want to assign your output to a new object, you can use the standard:

landmarks <- read.tps("example.tps")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Brilliant! readLines and grep were my missing key. This will unlock many future similar problems. – Etienne Low-Décarie Mar 17 '12 at 03:23
  • I don't know how many people would like this functionality, but perhaps you could publish it on github or somewhere similar. – Roman Luštrik Mar 17 '12 at 06:54
  • 1
    @EtienneLow-Décarie, you can take a look at the version of this function [I posted on Git Hub](https://gist.github.com/2062329), in which I've added comments throughout so you can see exactly how I've gone about solving your challenge. – A5C1D2H2I1M1N2O1R2T1 Mar 17 '12 at 19:39
  • @mrdwab, you have more than solved my issue! Your script is mature enough to import most .tps files. In my edit, I get the file name of the image on which the .tps is based (as you did in your new script). I am trying to end up with a script that will import all .tps file fields. One of the difficulties to creating a generalized .tps import script is that there is no specification of the .tps file format and I do not know how it varies or even what is the extent of the fields that can be included.If you want your script advertised to .tps users, you could contact rohlf@life.bio.sunysb.edu . – Etienne Low-Décarie Mar 21 '12 at 12:05
0

Perhaps worth mentioning that there is now an R package geomorph which has a function readland.tps() for this.

cengel
  • 272
  • 8
  • 19