I'm just starting out with R and trying to get a grasp of some of the built in functions. I'm trying to organize a basic FASTA text file that looks like this:
>ID1
AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
>ID2
TCCAATTAAGTCCCTATCCAGGCGCTCCG
>ID3
GAACCGGAGAACGCTTCAGACCAGCCCGGAC
Into a table that'd look something like this:
ID Sequence
ID1 AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
ID2 TCCAATTAAGTCCCTATCCAGGCGCTCCG
ID3 GAACCGGAGAACGCTTCAGACCAGCCCGGAC
Or at least something organized in a similar manner. Unfortunately, whenever I try to use read.table
, I'm forced to set fill = TRUE
, to avoid the following error:
> read.table("ReadingText.txt", header=F, fill=F, sep=">")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 2 elements
Setting fill = TRUE
doesn't solve the problem as it just introduces unwanted blank fields. I feel like my problem is that R wants to treat each new line from the input as a new row in the output, whereas I'm expecting it to start a new row only at each ">" and move to the next column of the same row at each new line of the input.
So, how would you get this to work? Is read.table just the wrong function to be trying to do this with or is there something else? Also, I'd really like to accomplish this without using any packages! I want to get a good grasp of the built-in functions in R.
Thanks for taking the time to read this and apologies if I've done anything wrong posting this here. This is the first time I've asked anything.