1

I'm sure this is simple, but I'm not coming across an answer. I would like to import a data frame into R without processing the lines in a text editor first. Essentially, I want R to do it on read in. So all lines containing

FRAME   1 of ***
OR
ATOM-WISE TOTAL CONTACT ENERGY

will be skipped, deleted or ignored.

And all that will be left is;

Chain Resnum    Atom number Energy(kcal/mol)
ATOM      C     500   1519          -2.1286
ATOM      C     500   1520          -1.1334
ATOM      C     500   1521          -0.8180
ATOM      C     500   1522          -0.7727

Is there a simple solution to this? I'm not sure which scan() of read.table() arguments would work.

EDIT

I was able to use readLines and gsub to read in the file and remove the (un)necessary lines. I omitted the "" left from the deleted words and now I am trying to convert the character df to a regular(numeric) df. When I use data.frame(x) or as.data.frame(x) I am left with a data frame with 100K rows and only one variable. There should be at least 5 variables.

Rorschach
  • 31,301
  • 5
  • 78
  • 129
D.A. Ragland
  • 93
  • 2
  • 9

1 Answers1

1

readLines gives you a vector with one character string for each line of the file. So you have to split these strings into the elements you want before you convert to a dataframe. If you have nice space-separated values, try:

m = matrix(unlist(strsplit(data, " +")), ncol=5, byrow=TRUE)
    # where 'data' is the name of the vector of strings
df = data.frame(m, stringsAsFactors=FALSE)

Then for each column with numeric data, use as.numeric() on the column to convert.

tegancp
  • 1,204
  • 6
  • 13