How to read data file, containing monetary values, using R

Question

I have a simple txt file with some data that I need to read using R.

My file contains these rows :

a,      b,               c,    e
"1",    €57,000.00,      5,    10FEB2015
"K",    €0.00,           6,    15APR2016
"C",    €1,444,055.00,   6,    15APR2016

As you can see : the column b is a monetary value containing a thousands separator , which is the same delimiter for data (sep=",").

See: http://stackoverflow.com/questions/1523126/how-to-read-data-when-some-numbers-contain-commas-as-thousand-separator?rq=1 — Technophobe01, May 14 '16 at 01:43
I was thinking if we find the end of the monetary value properly, then we can just add quotes to it and read it normally. It just needs to be quoted. — Rich Scriven, May 14 '16 at 02:04

score 2 · Answer 1 · answered May 14 '16 at 02:28

sometimes you have to do it line-by-line:

library(stringi)
library(purrr)

lines <- 'a,b,c,e
"1",€57,000.00,5,10FEB2015
"K",€0.00,6,15APR2016
"C",€1,444,055.00,6,15APR2016'

dat <- readLines(textConnection(lines))

# we need the column names
cols <- stri_split_regex(dat[1], ",")[[1]]

# regular expression capture groups can do the hard work
map_df(stri_match_all_regex(dat[2:length(dat)], 
                     '^"([[:alnum:]]+)",€([[:digit:],]+\\.[[:digit:]]{2}),([[:digit:]]+),(.*)$'),
  function(x) {
    setNames(rbind.data.frame(x[2:length(x)], 
                              stringsAsFactors=FALSE), 
             cols)
  }
) -> df

# proper types
df$b <- as.numeric(stri_replace_all_regex(df$b, ",", ""))
df$e <- as.Date(df$e, "%d%b%Y")

str(df)

## Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of  4 variables:
##  $ a: chr  "1" "K" "C"
##  $ b: num  57000 0 1444055
##  $ c: chr  "5" "6" "6"
##  $ e: Date, format: "2015-02-10" "2016-04-15" ...

How to read data file, containing monetary values, using R

1 Answers1