6

In the plot generated by ggplot, each label along the x-axis is a string, i.e., “the product in 1990”. However, the generated plot there is a period in between each word. In other words, the above string is shown as “the.product.in.1990”

How can I ensure the above “.” is not added?

The following code is what I used to add string for each point along the x-axis

last_plot()+scale_x_discrete(limits=ddata$labels$text)

Sample code:

library(ggdendro)
x <- read.csv("test.csv",header=TRUE) 
d <- as.dist(x,diag=FALSE,upper=FALSE) 
hc <- hclust(d,"ave") 
dhc <- as.dendrogram(hc) 
ddata <- dendro_data(dhc,type="rectangle")
ggplot(segment(ddata)) + geom_segment(aes(x=x0,y=y0,xend=x1,yend=y1))
last_plot() + scale_x_discrete(limits=ddata$labels$text)

each row of ddata$labels$text is a string, like "the product in 1990". I would like to keep the same format in the generated plot rather than "the.product.in.1990"

Andrie
  • 176,377
  • 47
  • 447
  • 496
bit-question
  • 3,733
  • 17
  • 50
  • 63
  • Please see [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) question on how to provide a reproducible example that illustrates your problem. – joran Dec 08 '11 at 16:22
  • Hi Andrie, can you explain more detail on coding in backticks, thanks. – bit-question Dec 08 '11 at 16:26
  • Here is my code x<-read.csv("test.csv",header=TRUE) d<-as.dist(x,diag=FALSE,upper=FALSE) hc<-hclust(d,"ave") dhc<-as.dendrogram(hc) ddata<-dendro_data(dhc,type="rectangle") ggplot(segment(ddata))+geom_segment(aes(x=x0,y=y0,xend=x1,yend=y1)) last_plot()+scale_x_discrete(limits=ddata$labels$text) – bit-question Dec 08 '11 at 18:09
  • There is nothing in this code that introduces periods into the label names. How do you define `ddata$labels$text`? – Andrie Dec 08 '11 at 18:32
  • Yes, I just found that the periods are already included in ddata$labels$text. I think it is because of ggdendro So given ddata$labels$text is a string array, and each string is a set of words separated by period, how to remove these periods from r, I am quite new to R, thanks – bit-question Dec 08 '11 at 18:43
  • `ddata$labels$text <- gsub("\\."," ",ddata$labels$text)` – Ben Bolker Dec 08 '11 at 18:54
  • This is almost certainly not because of `ggdendro`. (Disclaimer: I wrote the package.) Anyway, I tested it five minutes ago and it doesn't happen there. If there are periods in your data, it's most likely there before you start the clustering. My guess is that it's the result of your `read.csv`. – Andrie Dec 08 '11 at 18:58
  • Ben and Andrie, thank you very much. It works perfect. – bit-question Dec 08 '11 at 18:59
  • Andrie, ggdendro is a wonderful package. But based on my code, you can see I did not include any . into the label. Is that because my csv file. For your information, the first row is my csv file corresponds to those labels, and each cell in this row is a string like " product in 1990" Any hiht will be highly appreciated. – bit-question Dec 08 '11 at 19:02
  • I import csv as posted, x<-read.csv("test.csv",header=TRUE) is there any hidden issue with this kind of usage? – bit-question Dec 08 '11 at 19:09
  • Try `check.names=FALSE` with your `read.csv` call. – Ben Bolker Dec 08 '11 at 19:13

1 Answers1

16

The issue arises because you are trying to read data with column names that contain spaces.

When you read this data with read.csv these column names are converted to syntactically valid R names. Here is an example to illustrate the issues:

some.file <- '
    "Col heading A", "Col heading B"
    A, 1
    B, 2
    C, 3
    '

Read it with the default read.csv settings:

> x1 <- read.csv(text=some.file)
> x1
  Col.heading.A Col.heading.B
1             A             1
2             B             2
3             C             3
4                          NA
> names(x1)
[1] "Col.heading.A" "Col.heading.B"

To avoid this, use the argument check.names=FALSE:

> x2 <- read.csv(text=some.file, check.names=FALSE)
> x2
  Col heading A Col heading B
1             A             1
2             B             2
3             C             3
4                          NA
> names(x2)
[1] "Col heading A" "Col heading B"

Now, the remaining issue is that a column name can not contain spaces. So to refer to these columns, you need to wrap your column name in backticks:

> x2$`Col heading A`
[1]     A     B     C      
Levels:          A     B     C

For more information, see ?read.csv and specifically the information for check.names.

There is also some information about backticks in ?Quotes

Andrie
  • 176,377
  • 47
  • 447
  • 496