-2

I have a csv file i'm importing but some columns are not getting the correct formatting. It's very strange and I can't figure it out. The entire top row is formatting the columns as characters, instead of numeric. Believe it is getting the formatting from V1/time1 column?

> dde = read.csv("dde.csv",header=F, sep=",",stringsAsFactors=FALSE)
> dde <- na.omit(dde) 
> dde <- as.data.frame(dde)
> time1 = as.POSIXct(strptime(paste(dde$V1, sep=" "),format="%m/%d/%Y %I:%M:%S %p"))
> head(dde)
                   V1      V2      V3      V4     V5     V6      V7     V8
1 9/7/2014 9:20:00 PM 105.061 136.099 169.961 98.391 96.515 112.802 87.277
2 9/7/2014 9:26:00 PM 105.068 136.074 169.954 98.399 96.521 112.790 87.276
3 9/7/2014 9:31:00 PM 105.078 136.107 170.031 98.414 96.528 112.813 87.287
4 9/7/2014 9:35:00 PM 105.068 136.102 170.001 98.424 96.516 112.789 87.289
5 9/7/2014 9:41:00 PM 105.074 136.109 169.994 98.422 96.519 112.821 87.300
6 9/7/2014 9:45:00 PM 105.091 136.114 170.028 98.420 96.539 112.829 87.302
       V9     V10     V11     V12     V13     V14     V15     V16     V17
1 1.29531 0.80054 1.38283 1.40974 1.20601 1.55867 1.61761 0.93644 1.08825
2 1.29503 0.80041 1.38256 1.40949 1.20607 1.55817 1.61749 0.93643 1.08828
3 1.29514 0.80026 1.38256 1.40963 1.20607 1.55828 1.61796 0.93650 1.08832
4 1.29520 0.80038 1.38250 1.40957 1.20594 1.55819 1.61791 0.93666 1.08835
5 1.29517 0.80042 1.38259 1.40965 1.20590 1.55843 1.61777 0.93658 1.08840
6 1.29519 0.80046 1.38275 1.40969 1.20588 1.55860 1.61780 0.93648 1.08834
      V18     V19     V20     V21     V22     V23     V24     V25     V26
1 0.93103 0.83073 1.72682 1.77693 1.50608 1.94649 1.01918 0.87190 1.12698
2 0.93106 0.83075 1.72689 1.77693 1.50593 1.94627 1.01912 0.87187 1.12676
3 0.93109 0.83069 1.72704 1.77693 1.50638 1.94661 1.01929 0.87202 1.12684
4 0.93110 0.83082 1.72687 1.77693 1.50645 1.94631 1.01941 0.87213 1.12694
5 0.93101 0.83080 1.72681 1.77693 1.50613 1.94643 1.01934 0.87199 1.12701
6 0.93097 0.83070 1.72706 1.77693 1.50613 1.94680 1.01927 0.87190 1.12696
      V27     V28     V29     V30
1 0.85511 0.90400 0.77324 1268.81
2 0.85520 0.90390 0.77332 1268.81
3 0.85517 0.90405 0.77328 1268.81
4 0.85515 0.90415 0.77333 1268.81
5 0.85508 0.90423 0.77344 1268.81
6 0.85513 0.90412 0.77334 1268.81
> V22 = xts(dde$V22, order.by=time1)
> V22 <-to.minutes(V22[,1],240,'minutes')
> V22 <- align.time(xts(V22),5 * 60)
> 
> V2 = xts(dde$V2, order.by=time1)
> V2 <-to.minutes(V2[,1],240,'minutes')
Error in to.period(x, "minutes", k = k, name = name, ...) : 
  unsupported type
> V2 <- align.time(xts(V2),5 * 60)
> 
> class(dde$V22)
[1] "numeric"
> class(dde$V2)
[1] "character"
> typeof(dde$V22)
[1] "double"
> typeof(dde$V2) 
[1] "character"
user3773444
  • 333
  • 3
  • 12
  • Can you provide the output of `dput(readLines("dde.csv", n=5))` – GSee Sep 27 '14 at 20:59
  • 1
    > dput(readLines("dde.csv", n=5)) c("9/7/2014 9:20:00 PM,105.061,136.099,169.961,98.391,96.515,112.802,87.277,1.29531,0.80054,1.38283,1.40974,1.20601,1.55867,1.61761,0.93644,1.08825,0.93103,0.83073,1.72682,1.77693,1.50608,1.94649,1.01918,0.8719,1.12698,0.85511,0.904,0.77324,1268.81", "9/7/2014 9:26:00 PM,105.068,136.074,169.954,98.399,96.521,112.79,87.276,1.29503,0.80041,1.38256,1.40949,1.20607,1.55817,1.61749,0.93643,1.08828,0.93106,0.83075,1.72689,1.77693,1.50593,1.94627,1.01912,0.87187,1.12676,0.8552,0.9039,0.77332,1268.81", ...(omitted output for length) – user3773444 Sep 27 '14 at 21:12
  • First, you probably want to use `stringsAsFactors=FALSE` in the `read.csv()` call and you don't need to call `as.data.frame` because it's already a `data.frame`. But, how do you expect `dde$time` or `dde[, c("test1")]` to work? look at `dde` after the first line. There is no `"time"` or `"test1"` in `names(dde)` – GSee Sep 27 '14 at 21:17
  • I set the names by (colnames(dde) <- c("time", etc) and also (time1 = as.POSIXct(strptime(paste(dde$time, sep=" "),format="%m/%d/%Y %I:%M:%S %p")). Tried (dde = read.csv("dde.csv",header=F, sep=",",stringsAsFactors=FALSE)) and still same issue – user3773444 Sep 27 '14 at 21:35
  • it's weird because half the columns are not getting the same formatting. – user3773444 Sep 27 '14 at 21:37

1 Answers1

1

When I do the things you said you did in the comments, it works for me

dde <- read.csv("dde.csv", header=FALSE, stringsAsFactors=FALSE)
dde <- na.omit(dde) 
colnames(dde)[1] <- "time"
colnames(dde)[2] <- "test1"

time1 = as.POSIXct(strptime(paste(dde$time, sep=" "),format="%m/%d/%Y %I:%M:%S %p"))
test1 = xts(dde[, c("test1")], order.by=time1)
test1 <- to.minutes(test1[,1],240,'minutes')
test1 <- align.time(xts(test1),5 * 60)
test1
#                    minutes.Open minutes.High minutes.Low minutes.Close
#2014-09-07 21:30:00      105.061      105.068     105.061       105.068
GSee
  • 48,880
  • 13
  • 125
  • 145
  • Still does not work - it's formatting the top row with the time. – user3773444 Sep 27 '14 at 23:28
  • Do you have an email I could send the file output to?? It will be too long for the comment field. – user3773444 Sep 29 '14 at 17:26
  • @user3773444 No. Create a minimal, [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example and add it to your question. – GSee Sep 29 '14 at 17:37
  • Added the exact code with data. It works for V22 but not the V2 column. Half the dde file is getting the correct formatting while the other half is not.. Please help!! – user3773444 Sep 29 '14 at 23:10
  • If you call `as.numeric(dde$V2)` (while `dde` is still a `data.frame`) do you get a warning about NAs being being introduced, similar to the output from `as.numeric(c("1", "N/A"))`? – GSee Sep 29 '14 at 23:26
  • dde$V2 <- as.numeric(dde$V2) worked! I was trying as.numeric(dde) before and it didn't work. Thank you very much for all your help GSee! – user3773444 Sep 30 '14 at 00:45
  • @user3773444 you could use `read.table` instead of `read.csv` and make use of the `colClasses` argument. – GSee Sep 30 '14 at 00:49