Importing a text file into R

Question

I have a text file which contains over 100,000 rows which I download weekly from SAP. it is downloaded as pages and each page contains the same header along with dashed line. a minimal example with two pages each containing only two items is below.

------------------------------------------------------------
|date              |Material          |Description         |
|----------------------------------------------------------|
|10/04/2013        |WM.5597394        |PNEUMATIC           |
|11/07/2013        |GB.D040790        |RING                |
------------------------------------------------------------

------------------------------------------------------------
|date              |Material          |Description         |
|----------------------------------------------------------|
|08/06/2013        |WM.4M01004A05     |TOUCHEUR            |
|08/06/2013        |WM.4M010108-1     |LEVER               |
------------------------------------------------------------

what I would like to do is import this file into R with only one header and no dash lines. I tried:

read.table( "myfile.txt",  sep = "|", fill=TRUE)

Many thanks

Is the example the content of a single or of two files? – Sven Hohenstein Jan 14 '14 at 13:23 — Sven Hohenstein, Jan 14 '14 at 13:23
all rows are contained in a single text file – Ragy Isaac Jan 14 '14 at 13:31 — Ragy Isaac, Jan 14 '14 at 13:31

Sven Hohenstein · Accepted Answer · 2014-01-14T14:01:45.153

Another readLines approach:

l <- readLines("myfile.txt")

# remove unnecessary lines
l <- grep("^\\|?-+\\|?$|^$", l, value = TRUE, invert = TRUE)

# remove duplicated headers
l2 <- c(l[1], l[-1][l[-1] != l[1]])

# split
lsplit <- strsplit(l2, "\\s*\\|")

# create data frame
dat <- setNames(data.frame(do.call(rbind, lsplit[-1])[ , -1]), lsplit[[1]][-1])


        date      Material Description
1 10/04/2013    WM.5597394   PNEUMATIC
2 11/07/2013    GB.D040790        RING
3 08/06/2013 WM.4M01004A05    TOUCHEUR
4 08/06/2013 WM.4M010108-1       LEVER

redmode · Answer 2 · 2014-01-14T14:17:31.060

You can pre-process file like text, then use read.table:

lines <- readLines("myfile.txt")
lines <- sapply(lines, gsub, pattern="[-]{2,}|[|]", replacement="")
lines <- c(lines[2], lines[lines!="" & lines!=lines[2]])

read.table(text=lines, header=T)

gives

        date      Material Description
1 10/04/2013    WM.5597394   PNEUMATIC
2 11/07/2013    GB.D040790        RING
3 08/06/2013 WM.4M01004A05    TOUCHEUR
4 08/06/2013 WM.4M010108-1       LEVER

agstudy · Answer 3 · 2014-01-14T13:51:41.853

2

You can use readLines and read.table (may be not very efficient):

ll <- readLines(textConnection(txt))
dat <- read.table(text=ll[!grepl('--',ll)],sep='|',header=TRUE)[,-c(1,5)]
dat[!grepl('date',dat$date),]
            date           Material          Description
1 10/04/2013         WM.5597394         PNEUMATIC           
2 11/07/2013         GB.D040790         RING                
4 08/06/2013         WM.4M01004A05      TOUCHEUR            
5 08/06/2013         WM.4M010108-1      LEVER

edited Jan 14 '14 at 13:51

answered Jan 14 '14 at 13:38

agstudy

119,832
17
199
261

It filters 'WM.4M010108-1' – redmode Jan 14 '14 at 13:48

dickoa · Answer 4 · 2014-01-14T14:58:09.963

As pointed out by the others answers, readLines is the way to go

sap <- readLines("myfile.txt")
sap <- gsub("(^\\||\\|$|\\-{2,}|\\s+)", "", sap)
sap <- sap[nchar(sap) > 0]
ind <- grep("^date", sap)
header <- sap[ind]
header <- unique(unlist(strsplit(header, "\\|")))
sap <- sap[-ind]

sap <- read.table(text = sap, sep = "|",
                  col.names = header,
                  stringsAsFactors = FALSE)
str(sap)
##         date      Material Description
## 1 10/04/2013    WM.5597394   PNEUMATIC
## 2 11/07/2013    GB.D040790        RING
## 3 08/06/2013 WM.4M01004A05    TOUCHEUR
## 4 08/06/2013  WM.4M0101081-1       LEVER

It removes "-" in Material column, last row – redmode Jan 14 '14 at 14:20 — redmode, Jan 14 '14 at 14:20

score -1 · Answer 5 · answered Jan 14 '14 at 13:42

-1

You may have to write a custom read.file() function. I suggest to start with scan(), read each line as an element of a vector and then manipulate each line according to its content

answered Jan 14 '14 at 13:42

Andrea

593
2
8

Importing a text file into R

5 Answers5

Linked