0

I have a file which contains data format like this:

           48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row  17 16 10 12  9 15 10 19  9 15  7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  3  5  1  5  2  0  3  1  2  2  3  1  3  2  1  2  1  1  1  0  0  1  1
           71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row  1 6 0 2  9 5 1 19 9 15 7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  2 5 1 5  2 0 3 1  2 2  3  1  3  2  1  2  1  1  1  0  0  1  1

Is there anyway to read this format into R? Thanks! :>

Ben
  • 946
  • 1
  • 11
  • 19
  • Why not `read.table()`? – Rich Scriven Sep 08 '15 at 22:27
  • If your data is truly representative (i.e., unequal number of data on each group of rows), then you'll likely need to roll your own function, perhaps using `readLines`, `grep`, and `strsplit`. If it's structured better than you have, then perhaps `read.delim` or `read.fwf` are viable alternatives. – r2evans Sep 08 '15 at 22:28
  • What is your desired output? Because of the row-oriented and variable-length rows, it isn't immediately obvious what you want from this. Perhaps edit your question to include a small section of the desired output. (For instance: a matrix with this data transposed, three columns, first row 48, 17, 3.) – r2evans Sep 09 '15 at 00:09
  • @r2evans Thanks for your reply. I just need to load the file in the script. – Ben Sep 09 '15 at 04:50
  • The question, @Ben, is "what data structure are you looking for?". Because it is irregular, a standard matrix import (and data.frame) will include `NA`s, perhaps not what you want. List of vectors? List of lists? Are the column headers meaningful? Convert into a data.frame with three columns, *as I suggested in my previous comment*? What tool will be using this data? (You need to learn how to [ask better questions](http://stackoverflow.com/help/mcve), as well [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example).) – r2evans Sep 09 '15 at 17:59

2 Answers2

1
library(stringi)
library(dplyr)
library(magrittr)
library(tidyr)

text = 
  "48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
A_row  17 16 10 12  9 15 10 19  9 15  7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  3  5  1  5  2  0  3  1  2  2  3  1  3  2  1  2  1  1  1  0  0  1  1
71 72 73 74 75 76 77 78 80 81 83 84 85 86 87 88 89 90 94 97 103 104
A_row  1 6 0 2  9 5 1 19 9 15 7  3  5 12  6  4  6  8  1  7  6  5  4
B_row  2 5 1 5  2 0 3 1  2 2  3  1  3  2  1  2  1  1  1  0  0  1  1"

df  = 
  text %>% 
  # split over newlines (could also be accomplished by readLines)
  stri_split_fixed(pattern = "\n") %>% 
  # need to take first list corresponding to text
  extract2(1) %>%
  # make the text a column in the dataframe
  {data_frame(values = .)} %>%
  # identify rows based on what type of data they contain
  # assume a repeating pattern every 3 lines
  mutate(variable = c("id", "A_row", "B_row") %>% rep(length.out = n())) %>%
  # for each type of data
  group_by(variable) %>%
  summarize(value = 
              values %>%
              # concatenate all values
              paste(collapse = " ") %>%
              # remove headers (might need to modify regex)
              stri_replace_all_regex("[A-Z]_row  ", "") %>%
              # split as space separated data
              stri_split_regex(pattern = " +")) %>%
  # unnest the lists
  unnest(value) %>%
  # make values numeric
  mutate(value = as.numeric(value)) %>%
  # for each variable, number 1 through n() to guess new row ID's
  group_by(variable) %>%
  mutate(n = 1:n()) %>%
  # reshape data
  spread(variable, value)
bramtayl
  • 4,004
  • 2
  • 11
  • 18
0

As commented above, one approach would be to use read.delim (maybe in chunks using skip & nrows), and then cbind to reassemble them.

Depending on the file (as pasted it looks like it might need additional preprocessing to be used with read.delim), another approach would be to use readLines and strsplit

woodvi
  • 1,898
  • 21
  • 27