-3

I have a .xlsx file with several column (with some inter-dependencies). I'd like to plot multiple graphs on the same chart using a select number of the columns. The first column is Date (which will be my only X-variable) and the remainder columns of interests will be Y-values. There are 1000 rows of data in this file. enter image description here

So ...

X-axis ... "Date" column only
Y-axis (multiple data) ... columns B, C, D, E, T, U, V only

Question:
How to:
1) Read the file
2) plot a line graph of the data, all on the same chart (X-axis = Date, Y-axis = columns B, C, D, E, T, U, V)
3) Color code each line with some type of a legend

I've read this post and many more (not allowed to post more than 2 links??) ... none has been helpful. Most are too arbitrary:
how to plot all the columns of a data frame in R

Community
  • 1
  • 1
kdrai
  • 1
  • 2
  • can you make your .xlsx file available via dropbox or google drive? – johnson-shuffle Jan 17 '16 at 22:14
  • 1
    `df <- readxl::read_excel("myfile",skip=1) ; melted <- tidyr::gather(df,key,value,-Date) ; ggplot(melted, aes(x=Date,y=value,color=key)) + geom_line()`. You may need to adapt read_excel to your file specifics. I think that all of your questions have duplicates. – scoa Jan 17 '16 at 22:17
  • I think the question is not necessarily a duplicate, because the Excel sheet has merged cells and multi-row headers. Seems likely that read_excel function might have difficulties with such a construct. Would be better to do some rearrangement of the headers in Excel (or an ope-source work-alike) before attempting to import. – IRTFM Jan 18 '16 at 02:32
  • @feats-by-jake, here's a link to the file. again, i stripe out the over 1000 rows of data for privacy issues [link] (https://www.dropbox.com/sh/7s17i9fcxbsmqlg/AAAaiMX_zdhZanErETUp8MoHa?dl=0) – kdrai Jan 18 '16 at 14:34
  • @scoa, when i run the suggested code, I get the error message below `Error in names(sel)[unnamed] <- sel[unnamed] : NAs are not allowed in subscripted assignments In addition: Warning messages: 1: package ‘tidyr’ was built under R version 3.2.3 2: package ‘readxl’ was built under R version 3.2.3 > ` ... FYI, I installed and included packages for readxl, tidyr, ggplot2 ... – kdrai Jan 18 '16 at 14:51
  • All, if it helps, I've stripped out Rows 1 and 3 – kdrai Jan 18 '16 at 15:03

1 Answers1

0

The problem you have is with this labels/sublabels combination. They mess up the import (variable classes are not recognized). Here is a two-step solution.

  1. In the first step, we import the database just to extract clean column names. What I did for that was to concatenate the main label (row 2) with the sublabel (row 3) when there was one. There are two pairs of identical column labels, so we also rename them to have clean colnames (I suggest you take the time to review your variable names and give them proper labels). Then we save them as an object (n).
  2. Then, we import the file again skipping the first two rows. That way, read_excel knows what classes to expect. We assign the previously saved names to the new data.frame. Now the data is clean. The rest is trivial: melt with tidyr:gather and plot with ggplot.

Code

library(readxl)
library(tidyr)
library(zoo)
library(ggplot2)
df <- read_excel("./myfile.xlsx",skip = 1)

names(df)[!is.na(df[1,])] <- paste(na.locf(names(df)[!is.na(df[1,])]),df[1,][!is.na(df[1,])],sep="_")
names(df)[duplicated(names(df))] <- paste0(names(df)[duplicated(names(df))],"bis")

n <- names(df)

df <- read_excel("./myfile.xlsx",skip = 2)
names(df) <- n
# df <- dplyr::slice(df,1:3) # this line is for the censored datafile that has only three rows

melted <- gather(df,key,value,-Date)

ggplot(melted, aes(x=Date,y=value,color=key)) + geom_line()

Of course, with only three rows of data, the result is ugly:

enter image description here

scoa
  • 19,359
  • 5
  • 65
  • 80
  • I think you misunderstood my question. I'm only interested in ploting: **Columns** A, B, C, D, T, U, V (emphasis on Column, from the picture) Also, I tried your code above (installed all needed packages) but don't see any plot in the "Plots" window What am I doing wrong? – kdrai Jan 18 '16 at 22:33
  • @kdrai do you get an error? What is it? For the columns, my solution works to plot all columns of your data against the Date column. If you want only a subset, replace `gather(df,...)` with `gather(df[,c(1:4,20,21)],...)` – scoa Jan 19 '16 at 09:40
  • ok, i spoke too early. i was able to install R3.2.3 with no issues. runtime errors are gone, i source the my .R file, no issue ... but after run the code, nothing happens. no graph! i haven't changed anything, other than installing the respective packages and giving it the right file path – kdrai Jan 19 '16 at 16:30
  • no errors but nothing in the "Plots" window either Here's what I see on the console: > source('') > df <- read_excel("/myfile.xlsx",skip = 2) – kdrai Jan 19 '16 at 17:14