0

I have a bunch of csv files that I need to read. Each file has a header, most have footers, and half have column headings that appear sporadically within the body of the file. I would like to delete the header, footer and sporadic column headings.

I include reproducible examples in almost all of my questions and answers, but in this case since I am reading an external file I am not sure how to do that.

Each header is three lines long. I can remove the header with the following line (which uses 'skip'):

d <- read.csv('c:/users/mark w miller/simple R programs/data_with_header_footer.csv', header=T, skip=2)

The number of lines between the header and footer varies among files. However, the footer always begins with the line: 'Symbols:'. The first line of the footer occupies only the first cell of that row. The number of lines in the footer varies among files.

Some files have sporadic column headings within the body of the file. The first row begins with a table number such as 'Table 4.3-1'. The last row of these sporadic headings always begins with something like: 'Number_reporting' 'Year 1' 'Area 1' 'Area 2' 'Year 2' 'Area 1' 'Area 2'.

How can I delete these footers and sporadic column headings? I would prefer not to edit each file manually because there are a large number of files and errors might occur when deleting a lot of rows by hand.

Thank you for any suggestions.

Mark Miller
  • 12,483
  • 23
  • 78
  • 132

1 Answers1

3

you can use readLines then grep for the relevant footers and column headers. With nothing more concrete it is hard to give an example.

dum.data<-readLines('some.txt')
dum.data<-dum.data[-c(1:3)]

if(length(grep("Symbols:",dum.data))>0){
dum.data<-dum.data[-c(grep("Symbols:",dum.data):length(dum.data))]
}

if(length(grep("Table[0-9].[0-9]".dum.data))>0){
dum.data<-dum.data[-c(grep("Table[0-9].[0-9]".dum.data):app.marker)]
}

app.marker would be an appropriate grep for the end of your sporadic header which is pretty vague. Once these have been removed you can process the remainder split by commas etc as required.

shhhhimhuntingrabbits
  • 7,397
  • 2
  • 23
  • 23