0

Just getting started in R and would appreciate your help. I have a series of daily prices for 6 financial indexes (S&P 500, etc.) that I'm trying to analyze.

For whatever reason some prices are coming in as chr instead of num, but when I'm converting chr columns to num columns the numbers are becoming NAs and I can't seem to figure out why!! I've posted the tails so you can see.

The only weird thing I've noticed is in my csv file when the price gets into the thousands it is coded in Excel as "number" vs "general". So for example the S&P 500 index starts with prices in the low teens and in Excel it is coded as "general" but when it crosses into 1,000+ then it becomes "number" (don't know if that's the issue).

I'm not sure how to upload a copy of the csv file here, but would be happy to provide it.

> index_prices <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE, na.strings=c(""))

> index_prices$Date <- as.Date(index_prices$Date, "%m/%d/%Y")

> str(index_prices)
'data.frame':   30790 obs. of  7 variables:
 $ Date                               : Date, format: "1936-03-31" "1936-04-01" ...
 $ S.P.500.TR.USD.1936.               : chr  "14.96" "14.96" "14.96" "14.96" ...
 $ BBgBarc.US.Agg.Bond.TR.USD         : chr  NA NA NA NA ...
 $ MSCI.EAFE.NR.USD                   : chr  NA NA NA NA ...
 $ BBgBarc.Gbl.Agg.Ex.USD.TR.USD      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ FTSE.Nareit.All.Equity.REITs.TR.USD: chr  NA NA NA NA ...
 $ FTSE.Treasury.Bill.3.Mon.USD       : num  NA NA NA NA NA NA NA NA NA NA ...

> tail(index_prices)
            Date S.P.500.TR.USD.1936. BBgBarc.US.Agg.Bond.TR.USD MSCI.EAFE.NR.USD
30785 2020-07-12            68,459.51                   2,374.28         5,794.25
30786 2020-07-13            67,818.58                   2,374.59         5,879.18
30787 2020-07-14            68,736.76                   2,377.91         5,848.82
30788 2020-07-15            69,362.39                   2,377.53         5,958.46
30789 2020-07-16            69,128.33                   2,380.99         5,922.00
30790 2020-07-17            69,328.78                   2,379.84         5,921.61
      BBgBarc.Gbl.Agg.Ex.USD.TR.USD FTSE.Nareit.All.Equity.REITs.TR.USD
30785                        492.20                           18,440.27
30786                        491.36                           18,208.13
30787                        492.10                           18,366.16
30788                        493.63                           18,532.83
30789                        494.14                           18,296.35
30790                        493.40                           18,554.14
      FTSE.Treasury.Bill.3.Mon.USD
30785                       659.33
30786                       659.33
30787                       659.34
30788                       659.34
30789                       659.34
30790                       659.34

> index_prices[,2:7] <- apply(index_prices[,2:7], 2, function(x) as.numeric((x)))
Warning messages:
1: In FUN(newX[, i], ...) : NAs introduced by coercion
2: In FUN(newX[, i], ...) : NAs introduced by coercion
3: In FUN(newX[, i], ...) : NAs introduced by coercion
4: In FUN(newX[, i], ...) : NAs introduced by coercion

> str(index_prices)
'data.frame':   30790 obs. of  7 variables:
 $ Date                               : Date, format: "1936-03-31" "1936-04-01" ...
 $ S.P.500.TR.USD.1936.               : num  15 15 15 15 15 ...
 $ BBgBarc.US.Agg.Bond.TR.USD         : num  NA NA NA NA NA NA NA NA NA NA ...
 $ MSCI.EAFE.NR.USD                   : num  NA NA NA NA NA NA NA NA NA NA ...
 $ BBgBarc.Gbl.Agg.Ex.USD.TR.USD      : num  NA NA NA NA NA NA NA NA NA NA ...
 $ FTSE.Nareit.All.Equity.REITs.TR.USD: num  NA NA NA NA NA NA NA NA NA NA ...
 $ FTSE.Treasury.Bill.3.Mon.USD       : num  NA NA NA NA NA NA NA NA NA NA ...

> tail(index_prices)
            Date S.P.500.TR.USD.1936. BBgBarc.US.Agg.Bond.TR.USD MSCI.EAFE.NR.USD
30785 2020-07-12                   NA                         NA               NA
30786 2020-07-13                   NA                         NA               NA
30787 2020-07-14                   NA                         NA               NA
30788 2020-07-15                   NA                         NA               NA
30789 2020-07-16                   NA                         NA               NA
30790 2020-07-17                   NA                         NA               NA
      BBgBarc.Gbl.Agg.Ex.USD.TR.USD FTSE.Nareit.All.Equity.REITs.TR.USD
30785                        492.20                                  NA
30786                        491.36                                  NA
30787                        492.10                                  NA
30788                        493.63                                  NA
30789                        494.14                                  NA
30790                        493.40                                  NA
      FTSE.Treasury.Bill.3.Mon.USD
30785                       659.33
30786                       659.33
30787                       659.34
30788                       659.34
30789                       659.34
30790                       659.34
crmau
  • 1
  • Some values have `,`. You may replace the `,` with `""` using `gsub` and and then convert to numeric – akrun Jul 21 '20 at 21:09
  • As suggested by Akrun, see : https://stackoverflow.com/a/1523177/13513328 – Waldi Jul 21 '20 at 21:11
  • I think the issue may be also based on the excel file. You could open the file and change the format before reading – akrun Jul 21 '20 at 21:11

0 Answers0