1

I have a number of large data files from which I want to extract data and output the extracted data as the corresponding csv files.

I use the following code (in a function) to do this work ...

  A  <- read.table(file=InputFile, 
                   skip=36, sep="\t", header=TRUE, 
                   quote="\"",stringsAsFactors=FALSE)
  write.csv(A,file=OutputFile, row.names=FALSE)

Which works fine, except ... the header line in the data (line 37) has one extra tab. This means that I have to open the file in notepad (or similar) and remove the tab before I can apply the function.

Does anyone have any code that will remove this extra tab?

To add some clarity here is an example of what the file looks like ...

lines of data to be skipped
apples\toranges\tgrapes\t
1\t3\t5
2\t8\t3

... and here is what I want it to look like

lines of data to be skipped
apples\toranges\tgrapes
1\t3\t5
2\t8\t3

where \t represents a tab in the file and noting the extra tab in what becomes the header line after applying my code to the modified data.

DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
  • I am not quite sure I understand. Do you want to share a tiny example? One particular line down in the file holds one tab too many? The first 36 lines are to be discarded? If you just get header wrong you should get variable names shifted and it should not be hard to move those back... – Ott Toomet Nov 02 '16 at 18:01
  • Example provided. I'm wondering if this is a readLines problem. – DarrenRhodes Nov 03 '16 at 13:41

1 Answers1

0

I'm not sure that this is the best answer, any improvements are welcome. I got around the problem by using readLines() and writeLines() as follows.

Firstly, I have a sample text file as follows

blah
blah
blah
apples  oranges grapes  
1   2   3
3   2   1

It may not be apparent but there is an extra tab after grapes in what is to be the header line.

I used the following code to read in the text file,

A  <- readLines("sample01.txt", n = -1,skipNul=TRUE)

I found a useful function from f3lix here How to trim leading and trailing whitespace in R? where I used the function,

trim.trailing <- function (x) sub("\\s+$", "", x)

As follows,

A[4]  <- trim.trailing(A[4])

then I created a file without the last tab on what is to become the header column

writeLines(A, con = "sample02.txt", sep = "\n", useBytes = FALSE)

which gave the following text file,

blah
blah
blah
apples  oranges grapes
1   2   3
3   2   1

which doesn't have the trailing tab. Which means I can use it in my original function (with some minor changes such as skipping 3 lines rather than 36 and obviously the file name needs changing).

This works but I'm sure someone out there could do better.

Community
  • 1
  • 1
DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29