1

How can I read a vector of lines (not a file) with fwf into a data frame?

Right now, I can think of two ways, but I really feel that there has to be a better way. Any idea is appreciated.

  1. Use data.frame() + substring(). It does the job, but I am not able to generalize it easily if the data is "ragged" (which it is, by blocks like the one below). I got it from the answer here: Read fixed width text file

  2. Use write_lines() and read_fwf() from readr. I'd like to avoid writing a external file. Actually, it seems that read_fwf() should do the work directly on literal data, but I cannot make it work: it keeps understanding the string/vector of lines as a path. Something like:

    write_lines(literaldata, "fwf_sample.txt")
    read_fwf("fwf_sample.txt", fwf_widths(rep(8, 12)))
    

A data sample follows below, with the code that leads to the error.

    literaldata <- "CHEXA     278375       2  419991  419976  418527  418528  434131  434116+         420108  420107
CHEXA     278376       2  420028  420029  419994  419997  434168  434169+         434134  434137
CHEXA     278377       2  419961  418516  418517  419956  434101  420119+         420118  434096
CHEXA     278378       2  419965  418519  418520  419967  434105  420116+         420115  434107
CHEXA     278379       2  419965  419984  420025  419971  434105  434124+         434165  434111
CHEXA     278380       2  418521  419972  419967  418520  420114  434112+         434107  420115"

library(readr)
lines<-read_lines(literaldata)
# The code above is just to get a reproducible example similar to the one I get in the data cleaning process
read_fwf(lines, fwf_widths(rep(8,  12)))


Error: 'CHEXA     278375       2  419991  419976  418527  418528  434131  
434116+         420108  420107CHEXA     278376   ...

Thanks in advance

loistf
  • 11
  • 3

2 Answers2

0

Not sure what exactly it is you're doing. The function read_fwf() works just fine on your data.

literaldata <- "CHEXA     278375       2  419991  419976  418527  418528  434131  434116+         420108  420107
CHEXA     278376       2  420028  420029  419994  419997  434168  434169+         434134  434137
CHEXA     278377       2  419961  418516  418517  419956  434101  420119+         420118  434096
CHEXA     278378       2  419965  418519  418520  419967  434105  420116+         420115  434107
CHEXA     278379       2  419965  419984  420025  419971  434105  434124+         434165  434111
CHEXA     278380       2  418521  419972  419967  418520  420114  434112+         434107  420115"

library(readr)
read_fwf(literaldata, fwf_widths(rep(8,  12)))

# # A tibble: 6 x 12
#      X1     X2    X3     X4     X5     X6     X7     X8     X9   X10    X11    X12
#   <chr>  <int> <int>  <int>  <int>  <int>  <int>  <int>  <int> <chr>  <int>  <int>
# 1 CHEXA 278375     2 419991 419976 418527 418528 434131 434116     + 420108 420107
# 2 CHEXA 278376     2 420028 420029 419994 419997 434168 434169     + 434134 434137
# 3 CHEXA 278377     2 419961 418516 418517 419956 434101 420119     + 420118 434096
# 4 CHEXA 278378     2 419965 418519 418520 419967 434105 420116     + 420115 434107
# 5 CHEXA 278379     2 419965 419984 420025 419971 434105 434124     + 434165 434111
# 6 CHEXA 278380     2 418521 419972 419967 418520 420114 434112     + 434107 420115

From the documentation of read_fwf() (highlight mine):

Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path).

Claus Wilke
  • 16,992
  • 7
  • 53
  • 104
  • Thanks to your reply, It seems that I can do:
    'paste0(vectorOfLines,collapse="\n")' It does the job for me, but it still seems from the readr documentation that I should not need the 'paste0()' Literal data is most useful for examples and tests. It must contain at least one new line to be recognised as data (instead of a path) or be a **vector of greater than length 1**.
    – loistf Dec 27 '17 at 22:04
  • You still haven't provided a complete reproducible example, so we don't know what you're doing. Please read this: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Claus Wilke Dec 27 '17 at 22:23
0

Still not clear to me why my previous example does not work, but adding a paste0(...,collapse = "\n") does the job. So, something like the following works for me:

literaldata <- "CHEXA     278375       2  419991  419976  418527  418528  434131  434116+         420108  420107
CHEXA     278376       2  420028  420029  419994  419997  434168  434169+         434134  434137
CHEXA     278377       2  419961  418516  418517  419956  434101  420119+         420118  434096
CHEXA     278378       2  419965  418519  418520  419967  434105  420116+         420115  434107
CHEXA     278379       2  419965  419984  420025  419971  434105  434124+         434165  434111
CHEXA     278380       2  418521  419972  419967  418520  420114  434112+         434107  420115"

library(readr)
lines<-read_lines(literaldata)
# The code above is just to get a reproducible example similar to the one I get in the data cleaning process
# The following gives an error
read_fwf(lines, fwf_widths(rep(8,  12)))
# The following give the expected result
read_fwf(paste0(lines,collapse = "\n"), fwf_widths(rep(8,  12)))

Thanks to everyone for the help and replies

loistf
  • 11
  • 3