0

I have a number of text files containing psuedo coordinates with a format of [x1 y1][x2 y2]... I am trying to import these files into R so I can analyse them. However when I import them using read.table they become a list with two variables (x and y) with each value being "[x" or "y]" and each variable having a number of factors. My question is there a way to import or manipulate the data so that it is a dataframe of the numerical x values and y values only?

I have attempted removing the "[" and "]" characters using substr() but get
"Error in nchar(test[1, 2]) : 'nchar()' requires a character vector"
as an error message.

Craig
  • 35
  • 1
  • 5
  • 4
    Can you please include data and/or code that will provide us with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) ? – Ben Bolker Sep 19 '16 at 23:05

1 Answers1

2

Lets assume this is the input file and it's in your working directory and is named "fil.txt"

[5 6][7 8][9 10]
[5 6][7 8][9 10]
[5 6][7 8][9 10]

Then you can use readLines, remove the "][" pairs and the beginning and ending "[" and "]" from each line and then use scan to read the paired values:

x <-"[5 6][7 8][9 1
[5 6][7 8][9 10]
[5 6][7 8][9 10]"

scan(text= gsub("(^\\[)|(\\]$)", "", gsub("\\]\\[", " ", readLines(textConnection(x))) ), what = list(numeric(), numeric() ) )
Read 9 records
[[1]]
[1] 5 7 9 5 7 9 5 7 9

[[2]]
[1]  6  8 10  6  8 10  6  8 10

# I later realized the pattern could just be "\\[|\\]" and use a single gsub()

> as.data.frame( .Last.value, col.names=c("x","y") )
  x  y
1 5  6
2 7  8
3 9 10
4 5  6
5 7  8
6 9 10
7 5  6
8 7  8
9 9 10
IRTFM
  • 258,963
  • 21
  • 364
  • 487