-2

I have a CSV file with the following lines:

10,130,A,100,1000
10,130,B,200,-200
10,130,C,300,1200
20,140,A,120,1050
20,140,B,220,-300
20,140,C,320,1250
30,120,A,145,1130
30,120,B,255,1000
30,120,C,355,1110
...

And so on, three lines for each increment by 10 in the first column. Each line contains two values in columns 4 and 5 for each type in column 3 (A, B and C). Values in columns 1 and 2 are the same for each set of three lines.

I read it with:

data <- read.csv("data_out.csv", header=FALSE, sep=",")

Each set of three lines read from the file contains 8 values that could be plotted as Y on a graph (example for the first three lines):

  1. Value in column 1 (10)
  2. Value in column 2 (130)
  3. Value in column 4 for type A (100)
  4. Value in column 5 for type A (1000)
  5. Value in column 4 for type B (200)
  6. Value in column 5 for type B (-200)
  7. Value in column 4 for type C (300)
  8. Value in column 5 for type C (1200)

They would be plotted for X = 10.

So the first 8 dots would have the following coordinates (X,Y):

(10,10); (10,130); (10,100); (10,1000); (10,200); (10,-200); (10,300); (10,1200)

The next three lines for 20 in column 1 would have coordinates:

(20,20); (20,140); (20,120); (20,1050); (20,220); (20,-300); (20,320); (20,1250)

And similarly for the third and any further set of three lines from the input file.

Dots representing each one of those 8 values from each set of three lines should be connected to form a line chart, similar to this one (but with 8 line charts, not 4 as on the example). So there would be 8 line charts on the same graph representing values for X=10, X=20, X=30, and so on.

Questions about the solution

I know how to plot one line, e.g. plot(data[,1],data[,4],type="l") but how to plot multiple lines?

And how to ensure that the 0 for Y is in the correct place so that the negative values for C5 can be properly plotted as well?

Also, I know that there is the aggregate function which could be used to group by the type (A, B, C), but I don't want to perform any summary or averaging, so I am probably looking for a filter (by the type) rather than aggregate?

I would probably also want to ensure that the amount of distinct values in C1 is the same as in C2 (purely for verification that the input data is fine).

Greg
  • 8,230
  • 5
  • 38
  • 53
  • What should your graph look like? Can you give an example? – rosapluesch May 14 '16 at 21:12
  • I really tried, but I cannot understand what sort of plot you asking for. Your explanation of how you want to reformat the data is also confusing. – SlowLoris May 14 '16 at 21:59
  • If I could just get answers to the About the solution questions I should be able to figure out the rest. The graph is really simple, it would look something like this: http://jpgraph.net/images/howto/mulyaxisex1.png but this one only shows 4 lines (or plots, I am not familiar with the terminology), mine would have 8 lines on the same graph. Maybe three Y scales would be a good idea, one for C1/C2 values, one for C4 and one for C5 values. – Greg May 14 '16 at 22:07
  • Also, just added an edit. – Greg May 14 '16 at 22:18
  • Your question is still unclear. First you should re-arrange your df that the coordinates for each point (x1,y1) are in the same row which I cannot find in your data. This makes it impossible to understand which values should go for one point. Then easy plotting using `plot(...,type="l")` for the first line and `lines(...)` for adding the following lines might be the solution. – rosapluesch May 15 '16 at 10:36

1 Answers1

1

I am going to work again on your problem, given the data you had added to your edit2. Hoping you can work on your actual problem after this explanation.

The data frame you say you read from a csv:

df = data.frame(
  c(10,130,"A",100,1000),
  c(10,130,"B",200,-200),
  c(10,130,"C",300,1200),
  c(20,140,"A",120,1050),
  c(20,140,"B",220,-300),
  c(20,140,"C",320,1250),
  c(30,120,"A",145,1130),
  c(30,120,"B",255,1000),
  c(30,120,"C",355,1110))

We need to transpose it to get it in the column format, you have initially mentioned in your post.

df = data.frame(t(df))

I name the column as per your example:

names(df) = c("C1","C2","C3","C4","C5")

Installing and reading packages into R:

install.packages("reshape2")
install.packages("ggplot2")
library(reshape2)
library(ggplot2)

melt reshapes your data into the long format that will help you plot the chart. First variables C4 and C5 are converted into long format, and renamed to identify them in the plot.

d1 = melt(df[,c(1,3:5)], id.vars = c("C1","C3"), measure.vars = c("C4","C5"), variable.name = "col")
d1$group = paste0(d1$C3,d1$col)

Then variables C1 and C2 are melted without [A,B,C].

d2 = unique(melt(df[,c(1:2)], id.vars = "C1", measure.vars = c("C1","C2"), variable.name = "group"))

Both the melted datasets are combined by the columns needed for plot.

p = rbind(d1[,c("C1","group","value")], d2[,c("C1","group","value")])
p$value = as.numeric(p$value)

You can plot, color the lines by the groups you have created to identify them, and then plot the label to read the values.

ggplot(p, aes(x=C1, y=value, group=group, color=group, label=value)) + 
  geom_line() + geom_point() + geom_text(aes(label=value, hjust= 1, vjust=-1))

enter image description here

Divi
  • 1,614
  • 13
  • 23
  • Thanks, I appreciate your take on it, but the input data is not exactly what I have. I have added once more edit with the example data that I read from csv and the exact output that should be plotted. What your graph plots may be correct but without explanation of what each transformation in your code does I wouldn't be able to fix it to handle the data that I read. – Greg May 14 '16 at 23:21
  • Your edited `df` has 9 columns and 5 rows. Is this format correct? Again, your explanation does not match the format. – Divi May 14 '16 at 23:26
  • This is one row in the csv file: `10,130,A,100,1000`, not a column. There are 300 rows like this in the file. For each three rows there will be 8 dots on the graph. Maybe I am not using the terminology properly? – Greg May 15 '16 at 12:04
  • That's what I thought. Simply remove this line `df = data.frame(t(df))` from the above code and it should work. – Divi May 15 '16 at 13:34
  • Also, in future please read [how to of reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) before asking your question. With your current problem, spend some time learning what each script does. It is a fairly easy problem and you should not have any difficulty solving it after the helper code. – Divi May 15 '16 at 13:38
  • OK, thank you. Is your answer still OK after I have rewritten the question? Please have a look and I will gladly accept as it is. – Greg May 15 '16 at 16:48
  • Yes, simply remove the line `df = data.frame(t(df))` from the given answer. – Divi May 15 '16 at 17:37