Why are all variables condensed into one column when importing .txt file into R

Question

I have a sample of .txt file here. A snapshot of my data is below:

I want to import this .txt file into R. The first column contains 13 characters. For the first row, the first column should be "201701001 011" and 236, 240, 236 are the 2nd, 3rd, and 4th column......

I tried the code below:

data <- read.table("<path>\\Sample.txt", sep = "\t")

But all variables are condensed into a single column. How should I separate them into different columns?

Lots of suggestions here https://stackoverflow.com/questions/14383710/read-fixed-width-text-file — Ronak Shah, Jun 03 '19 at 02:30

G. Grothendieck · Accepted Answer · 2019-06-03T12:50:41.173

The reason that all variables are condensed into one is that there are no tabs in the input file. Instead try one of these.

1) read.fwf This file has fixed width fields so use read.fwf specifying the field widths as the second argument. No packages are used.

u <- "https://raw.githubusercontent.com/Patricklv/Importing-.txt-file/master/Sample.txt"
widths <- c(13, rep(8, 9))
read.fwf(u, widths)

giving:

              V1  V2  V3  V4  V5  V6  V7  V8  V9 V10
1  201701001 011 236 240 236 226 224 238 239 240 232
2  201701001 111 299 285 237 252 227 249 237 233 238
3  201701001 211 287 292 296 230 237 234 235 254 251
4  201701001 311 286 287 311 283 237 240 226 240 246
5  201701001 411 270 273 282 318 277 243 248 236 243
6  201701001 511 279 276 284 280 305 285 262 249 241
7  201701001 611 288 284 286 281 272 299 284 257 238
8  201701001 711 293 290 292 284 269 278 298 282 257
9  201701001 811 314 305 290 298 267 265 282 292 277
10 201701001 911 314 310 310 295 288 270 261 292 292
11 2017010011011 308 311 321 309 281 277 270 250 301
12 2017010011111 325 319 312 332 303 287 294 275 254

It seems easy enough to count the fields by hand as we have done above but it could be done automatically from the first line of data L1 by locating the field ends, ends, which occur at a digit followed by two or more spaces (\\d +) or (|) by a digit followed by end of line (\\d$). It is important that there are at least two spaces since a single space can appear within the first field. Finally, the field widths, widths, are the first component of ends followed by the differences of successive positions in ends.

L1 <- readLines(u, 1)
ends <- gregexpr("\\d  |\\d$", L1)[[1]]
widths <- c(ends[1], diff(ends))

2) This is an alternative. Since a single space can appear in the first field and all real separators consist of at least 2 spaces we can read in the file, replace all occurrences of a run of 2 or more spaces with comma and then read that using a comma separator. u is from above. This is a bit longer but is still only one line and eliminates the need to count field widths. No packages are used.

read.table(text = gsub("  +", ",", readLines(u)), sep = ",")

3) Another alternative can be based on the fact that we already know from the question that the first field is 13 characters and the remaining fields are well separated by spaces so pick off the first field and cbind it to the rest re-reading the remainder using read.table. Again, no packages are used.

L <- readLines(u)
cbind(V0 = substring(L, 1, 13), read.table(text = substring(L, 14)))

score 1 · Answer 2 · answered Jun 03 '19 at 01:53

1

Use read_table from package readr:

df<-readr::read_table("https://raw.githubusercontent.com/Patricklv/Importing-.txt-file/master/Sample.txt",col_names=F)

answered Jun 03 '19 at 01:53

José

921
14
21

thank you for your answer, it actually worked much faster than read.fwf. However, when I use `sum(df$X9)` to calculate column sum, the resultant number is 175188935. When I do the same for data imported using read.fwf, I got 323405935, which is the real column sum. I want to know why is the difference. – Patrick Jun 03 '19 at 13:19
`read_table` worked well with the sample you offered, but it might be as safe as `read_fwf`, because it implies the column split based on whitespaces. – José Jun 04 '19 at 01:17

Why are all variables condensed into one column when importing .txt file into R

2 Answers2