The reason that all variables are condensed into one is that there are no tabs in the input file. Instead try one of these.
1) read.fwf This file has fixed width fields so use read.fwf
specifying the field widths as the second argument. No packages are used.
u <- "https://raw.githubusercontent.com/Patricklv/Importing-.txt-file/master/Sample.txt"
widths <- c(13, rep(8, 9))
read.fwf(u, widths)
giving:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 201701001 011 236 240 236 226 224 238 239 240 232
2 201701001 111 299 285 237 252 227 249 237 233 238
3 201701001 211 287 292 296 230 237 234 235 254 251
4 201701001 311 286 287 311 283 237 240 226 240 246
5 201701001 411 270 273 282 318 277 243 248 236 243
6 201701001 511 279 276 284 280 305 285 262 249 241
7 201701001 611 288 284 286 281 272 299 284 257 238
8 201701001 711 293 290 292 284 269 278 298 282 257
9 201701001 811 314 305 290 298 267 265 282 292 277
10 201701001 911 314 310 310 295 288 270 261 292 292
11 2017010011011 308 311 321 309 281 277 270 250 301
12 2017010011111 325 319 312 332 303 287 294 275 254
It seems easy enough to count the fields by hand as we have done above but it could be done automatically from the first line of data L1
by locating the field ends, ends
, which occur at a digit followed by two or more spaces (\\d +
) or (|
) by a digit followed by end of line (\\d$
). It is important that there are at least two spaces since a single space can appear within the first field. Finally, the field widths, widths
, are the first component of ends
followed by the differences of successive positions in ends
.
L1 <- readLines(u, 1)
ends <- gregexpr("\\d |\\d$", L1)[[1]]
widths <- c(ends[1], diff(ends))
2) This is an alternative. Since a single space can appear in the first field and all real separators consist of at least 2 spaces we can read in the file, replace all occurrences of a run of 2 or more spaces with comma and then read that using a comma separator. u
is from above. This is a bit longer but is still only one line and eliminates the need to count field widths. No packages are used.
read.table(text = gsub(" +", ",", readLines(u)), sep = ",")
3) Another alternative can be based on the fact that we already know from the question that the first field is 13 characters and the remaining fields are well separated by spaces so pick off the first field and cbind
it to the rest re-reading the remainder using read.table
. Again, no packages are used.
L <- readLines(u)
cbind(V0 = substring(L, 1, 13), read.table(text = substring(L, 14)))