2

I am having trouble plotting the graph. Everytime I try to plot it, instead of a line graph, I get a histogram like this -

enter image description here

I have attached the link to the csv file - https://docs.google.com/spreadsheets/d/1qaTqw9sSoOpeKIa5GnHr2cJ2_DKBb1-89eTukTtrKOQ/edit?usp=sharing

First 4 lines of data

Date        Comid       Low     High    Average Close   Trdno   Volume  Turnover    Company
01-01-2005  14,259.00   138.60  139.10  138.84  138.80  14.00   1,500.00    208,230.00  BRITISH AMERICAN TOBACCO BANGLADESH COMPANY LIMITED
02-01-2005  14,259.00   139.00  140.00  139.43  139.40  24.00   2,750.00    383,665.00  BRITISH AMERICAN TOBACCO BANGLADESH COMPANY LIMITED
03-01-2005  14,259.00   138.50  139.00  138.70  138.60  26.00   3,600.00    499,300.00  BRITISH AMERICAN TOBACCO BANGLADESH COMPANY LIMITED
04-01-2005  14,259.00   135.20  138.50  136.76  136.70  23.00   2,300.00    314,865.00  BRITISH AMERICAN TOBACCO BANGLADESH COMPANY LIMITED

I am trying to plot the 6th column (the one titled "Close" and I typed the following commands.

batbc <- read.csv("batbc.csv")
plot(batbc[, 6], type="l")
CMichael
  • 1,856
  • 16
  • 20
  • 1
    it looks like `bathc[,6]` is not numeric. What does `str(bathc[,6])` return – user20650 Jul 02 '15 at 20:21
  • 1
    The problem was with the comma, as the numbers were treated as characters. I removed the commas and it works now. – slingblade8129 Jul 02 '15 at 21:51
  • Great stuff, glad its solved.; btw it would be good to click the tick beside one of the answers below, to show that your question has been answered - you can also upvote either or both answers. – user20650 Jul 02 '15 at 22:53

2 Answers2

1

The problem is the commas as thousand separators. There are a few ways of solving this, but the neatest I've seen is from another SO answer.

For your data in particular, you need to do this:

setClass("num.with.commas")
setAs("character", "num.with.commas", 
      function(from) as.numeric(gsub(",", "", from)))
batbc <- read.csv("batbc.csv",
  colClasses = c("character", rep("num.with.commas", 7), "character"))

It should then work fine.

Note with the commas in place, the numbers are treated as character, and then converted to factors per the default behaviour of read.csv. When you try to plot a factor, you get a histogram. In that context, the type = "l" is ignored with a warning.

Community
  • 1
  • 1
Nick Kennedy
  • 12,510
  • 2
  • 30
  • 52
  • Thanks! Now that I know the commas is the problem, I can just remove the commas from the numbers in the spreadsheet instead of typing the code. – slingblade8129 Jul 02 '15 at 21:38
  • You could indeed. Incidentally, the thing to do in this situation would have been to do `class(batbc$Close)`. When you noticed it was a `factor`, then do `batbc$Close[which(is.na(as.numeric(as.character(batbc$Close))))]` and look at which items had failed a conversion to numeric. – Nick Kennedy Jul 02 '15 at 21:40
0

You need to read the csv with automatic factor conversion turned off.

Then you need to get rid of the thousands comma separator in that column (or for any relevant column).

Then coerce the character column to numeric. Directly coercing to numeric without thousands comma separator being handled will generate NA for rows having comma in.

Next you can plot normally.

batbc <- read.csv('BATB.csv', as.is = T)
batbc$Close <- gsub(',','',batbc$Close)
batbc$Close <- as.numeric(batbc$Close)
plot(batbc[, 6], type="l")

HTH.

Frash
  • 724
  • 1
  • 10
  • 19
  • Thanks! Now that I know the commas is the problem, I can just remove the commas from the numbers in the spreadsheet instead of typing the code. – slingblade8129 Jul 02 '15 at 21:38