0

I have some data in Notepad that is a mess. There is basically no space between any of the different columns which hold different data. I know the spaces for the data. For example, Columns 1-2 are X, Columns 7-10 are Y....

How can I organize this? Can it be done in R? What is the best way to do this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • agreed - an example would be good. Sounds like you're saying the data has fields, and you know the width of each field, but there are no separators (like space or comma etc) between the fields. (?) – azhrei Jul 20 '12 at 00:18
  • 2
    after you follow ttmaccer's request it sounds like a `read.fwf` – Tyler Rinker Jul 20 '12 at 00:18
  • agreed - an example would be good. Sounds like you're saying the data has fields, and you know the width of each field, but there are no separators (like space or comma etc) between the fields. this is pretty much it! – Rybrtbrdbtdr Trbdyrbdr Jul 20 '12 at 15:20

2 Answers2

1

?read.fwf may be a good bet for this circumstance.

Set the path to the file:

temp <- "\pathto\file.txt"

Then set the widths of the variables within the file, as demonstrated below.

#1-2 = x, 3-10=y
widths <- c(2,8)

Then set the names of the columns.

cols <- c("X","Y")

Finally, import the data into a new variable in your session:

dataset <- read.fwf(temp,widths,header=FALSE,col.names=cols)
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    Hi,That is what I had been trying to do, and it works! Thank you. But I have another question. If I have a very large data set, is it possible to create an Excel file with the widths and column names rather than entering it manually into R? – Rybrtbrdbtdr Trbdyrbdr Jul 27 '12 at 16:51
0

Something I've done in the past to handle that kind of mess is actually import it into excel as delimited width text, then save as a CSV.

Just a suggestion for you. If it's a one off project then that should be fine. no coding at all. But if it's a repeat offender... then you might look at regular expressions.

i.e. ^(.{6})(.{7})(.{2})(.{5})$ for 4 fields of 6,7,2 and 5 characters width in order.

whiskeyfur
  • 736
  • 1
  • 5
  • 14
  • 4
    Excel is a great way to make more of a mess – mdsumner Jul 20 '12 at 00:57
  • To each his own. It really comes down to, do you want to spend the time coding it if it's a one time deal? 15-45 minutes coding something to do the work in 5 minutes with excel? My boss doesn't pay me by tasks completed, just by the hour and I think he would be irritated if I wasted 25 minutes on making that one time program. – whiskeyfur Jul 21 '12 at 00:19
  • I'm stuck in it too and I fecking hate it. It's still a mess. – mdsumner Jul 21 '12 at 00:50