2

I am trying to read a fixed width text file using the read.fwf function. I am having difficulty though because a couple of my variables overlap. For example I may have the following line in my text file:

1A...

Variable one has a width of 1 and so it refers to "1" in the line. Variable 2 however has a width of 2 and refers to "1A". Looking at the read.fwf documentation it looks like you can specify widths, but this would make my second variable "A." instead of "1A". I was wondering if there is a way to use read.fwf. and specify start and end positions for the variables rather than just widths.

Note: this is similar to the question here but that discussion seemed to turn more towards a Stata package. I am not working with Stata in my example, I simply have start and end positions specified in a metadata file.

Community
  • 1
  • 1
decal
  • 987
  • 2
  • 14
  • 39
  • But the approach would be similar. Use `substr` as indicated in the "sqldf" approach. – A5C1D2H2I1M1N2O1R2T1 Jan 14 '15 at 19:24
  • So there is no way to do this simply with read.fwf? Sorry just want to be clear, I am fairly new to R and just want to make sure – decal Jan 14 '15 at 19:25
  • 3
    nope. But it's not too difficult to do if you have start and end positions. Usually that's just a matter of using `cumsum` and `diff` and related tools to get the right positions. A good read is [the blog post](https://sites.google.com/site/timriffepersonal/DemogBlog/newformetrickforworkingwithbigishdatainr) that I had linked to in my answer at the question your found. – A5C1D2H2I1M1N2O1R2T1 Jan 14 '15 at 19:27
  • Ahh good call, sorry I missed that – decal Jan 14 '15 at 19:29
  • Also see [my answer here](http://stackoverflow.com/a/18726054/1270695) where I used the example from `read.fwf` to show how I constructed the "sqldf" command. – A5C1D2H2I1M1N2O1R2T1 Jan 14 '15 at 19:32
  • Can you explain how distinct variables can overlap? This sounds like a recipe for disaster. Frankly, I would suggest that "overlap" really means "data are encoded inside strings" and that your should read in the maximal string lengths and then extract substring data with regex tools. --- Guess I'm really saying the same thing that other commenters have said. – Carl Witthoft Jan 14 '15 at 19:51
  • 3
    I nominate @AnandaMahto for the title of resident `R_Data-Munging_Guru`, a title owned on Rhelp by the indefatigable Jim Holtman. No disrespect is intended for others who could legitimately compete for that title including G.Grothendeick and AnthonyDAmico. – IRTFM Jan 14 '15 at 20:40
  • @BondedDust Technically, I believe Jim is the Data Mung**er** Guru :-) – Carl Witthoft Jan 15 '15 at 01:06

0 Answers0