0

I have a simple txt file with some data that I need to read using R.

My file contains these rows :

a,      b,               c,    e
"1",    €57,000.00,      5,    10FEB2015
"K",    €0.00,           6,    15APR2016
"C",    €1,444,055.00,   6,    15APR2016

As you can see : the column b is a monetary value containing a thousands separator , which is the same delimiter for data (sep=",").

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
Gauss
  • 21
  • 1
  • 5

1 Answers1

2

sometimes you have to do it line-by-line:

library(stringi)
library(purrr)

lines <- 'a,b,c,e
"1",€57,000.00,5,10FEB2015
"K",€0.00,6,15APR2016
"C",€1,444,055.00,6,15APR2016'

dat <- readLines(textConnection(lines))

# we need the column names
cols <- stri_split_regex(dat[1], ",")[[1]]

# regular expression capture groups can do the hard work
map_df(stri_match_all_regex(dat[2:length(dat)], 
                     '^"([[:alnum:]]+)",€([[:digit:],]+\\.[[:digit:]]{2}),([[:digit:]]+),(.*)$'),
  function(x) {
    setNames(rbind.data.frame(x[2:length(x)], 
                              stringsAsFactors=FALSE), 
             cols)
  }
) -> df

# proper types
df$b <- as.numeric(stri_replace_all_regex(df$b, ",", ""))
df$e <- as.Date(df$e, "%d%b%Y")

str(df)

## Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of  4 variables:
##  $ a: chr  "1" "K" "C"
##  $ b: num  57000 0 1444055
##  $ c: chr  "5" "6" "6"
##  $ e: Date, format: "2015-02-10" "2016-04-15" ...
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205