3

enter image description here

I have been trying to represent graphically many curves (about 12) in a single plot using ggplot2. I initially gathered the data in a Excel sheet and transferred them as such in R. The amount of data for each curve is different, the x values for each curve are also different. As such the data can not be considered as matrix or a data set. I would like to represent the curves without extracting the data in two columns respectively in order to represent the corresponding curves.

I tried many versions of code such as the following for representing the first 2 curves (without result):

library("ggplot2")
g <- ggplot(D, aes(x=V1))
k <- g + geom_line(aes(y=V2), colour="red")
s <- k + geom_line(aes(x=V5))
h <- s + geom_line(aes(y=V6), colour="green")

I am displaying hereafter a minimal version of huge amount of data. Even as such it looks very big, even though its only some 8 rows and 8 columns.I apologize for that. For the sake of a simple example I deleted many columns and rows. So, the curves to be represented are 4 in total: (V1,V2),(V5,V6),(V11,V12), and (V15,V16), where the first coordinate is x and the second y in each of the 4 cases. I will highly appreciate your help.

> dput(D)
 structure(list(V1 = structure(c(85L, 86L, 87L, 88L, 89L, 90L, 
 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "0", "0.005966", "0.011966", 
 "0.017966", "0.023966", "0.029966", "0.035966", "0.041966", "0.047966", 
 "0.053966", "0.059966", "0.065966", "0.071966", "0.077966", "0.083966", 
 "0.089966", "0.092265", "0.098408", "0.105918", "0.113602", "0.120645", 
 "0.130484", "0.137735", "0.148359", "0.154359", "0.165272", "0.171272", 
 "0.18083", "0.18683", "0.19283", "0.19883", "0.20483", "0.21083", 
 "0.21683", "0.22283", "0.22883", "0.23483", "0.24083", "0.252113", 
 "0.258113", "0.264113", "0.270113", "0.276113", "0.282113", "0.288113", 
 "0.294113", "0.300113", "0.306113", "0.312113", "0.318113", "0.324113", 
 "0.330113", "0.336113", "0.342113", "0.348113", "0.354113", "0.363916", 
 "0.375691", "0.381691", "0.393053", "0.399053", "0.405053", "0.411053", 
 "0.417053", "0.426986", "0.432986", "0.438986", "0.448759", "0.458853", 
 "0.464853", "0.470853", "0.481612", "0.487612", "0.497969", "0.503969", 
 "0.509969", "0.515969", "0.521969", "0.527969", "0.533969", "0.539969", 
 "0.551301", "0.557301", "0.562965", "0.568965", "0.574965", "0.580965", 
 "0.586965", "0.592965", "0.598965", "0.599966", "Displ.", "M11 (10-BF)"
  ), class = "factor"), V2 = structure(c(88L, 89L, 90L, 91L, 92L, 
 85L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "0", "112.369", 
 "149.825", "187.282", "224.738", "262.194", "299.651", "337.107", 
 "37.456", "374.564", "412.02", "449.476", "486.933", "524.389", 
 "561.845", "576.195", "605.792", "629.753", "648.093", "658.487", 
 "670.233", "677.776", "687.528", "692.703", "701.893", "706.104", 
 "712.587", "716.571", "720.277", "723.983", "727.688", "731.394", 
 "735.1", "738.806", "74.913", "742.512", "746.217", "749.923", 
 "756.33", "757.954", "759.576", "761.199", "762.82", "764.441", 
 "766.062", "767.654", "769.246", "770.837", "772.428", "774.018", 
 "775.572", "777.125", "778.678", "780.231", "781.783", "783.334", 
 "785.664", "788.255", "789.526", "791.883", "792.981", "793.987", 
 "794.895", "795.803", "796.996", "797.655", "798.313", "799.259", 
 "800.029", "800.407", "800.745", "801.259", "801.505", "801.915", 
 "802.145", "802.375", "802.604", "802.76", "802.915", "803.07", 
 "803.179", "803.188", "803.199", "803.322", "803.373", "803.413", 
 "803.438", "803.44", "803.441", "803.443", "803.444", "BaseFor."
 ), class = "factor"), V5 = structure(c(85L, 86L, 87L, 88L, 1L, 
 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "0", "0.005941", 
 "0.011941", "0.017941", "0.023941", "0.029941", "0.035941", "0.041941", 
 "0.047941", "0.053941", "0.059941", "0.065941", "0.071941", "0.077941", 
 "0.083941", "0.089941", "0.095941", "0.101941", "0.103817", "0.110449", 
 "0.118017", "0.125068", "0.13262", "0.143702", "0.152147", "0.15839", 
 "0.16439", "0.17039", "0.17639", "0.182967", "0.191488", "0.202601", 
 "0.208601", "0.214601", "0.223557", "0.229557", "0.235557", "0.241557", 
 "0.251764", "0.257764", "0.263764", "0.273723", "0.279723", "0.285723", 
 "0.296481", "0.302481", "0.308481", "0.314481", "0.320481", "0.329858", 
 "0.335858", "0.341858", "0.347858", "0.353858", "0.359858", "0.365858", 
 "0.371858", "0.38087", "0.38687", "0.39287", "0.404708", "0.415154", 
 "0.421154", "0.4287", "0.4347", "0.4407", "0.451398", "0.457398", 
 "0.463398", "0.469398", "0.475398", "0.487014", "0.497525", "0.509064", 
 "0.515064", "0.521064", "0.527064", "0.533064", "0.543151", "0.549151", 
 "0.555151", "0.566361", "0.57723", "0.58323", "0.58923", "0.59523", 
 "0.599941", "Displ.", "M13 (10-BF_M)"), class = "factor"), V6 =  structure    (c     (84L, 
  85L, 86L, 87L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
  "0", "112.442", "140.553", "168.663", "196.774", "224.885", "252.995", 
  "28.111", "281.106", "309.216", "337.327", "365.437", "393.548", 
  "421.659", "449.769", "477.598", "486.301", "515.282", "544.842", 
  "56.221", "567.028", "588.112", "612.031", "627.001", "636.278", 
  "644.516", "652.395", "660.274", "668.094", "676.388", "686.223", 
  "691.258", "696.203", "702.797", "706.954", "710.844", "714.734", 
  "721.266", "725.069", "728.873", "734.733", "738.113", "741.493", 
  "747.304", "750.435", "753.566", "756.618", "759.67", "763.8", 
  "765.277", "766.747", "768.217", "769.687", "771.156", "772.625", 
  "774.093", "776.263", "777.617", "778.97", "781.541", "783.744", 
  "784.896", "786.257", "787.267", "788.276", "789.981", "790.847", 
  "791.661", "792.411", "793.16", "794.53", "795.617", "796.748", 
  "797.29", "797.732", "798.143", "798.555", "799.151", "799.467", 
  "799.753", "800.244", "800.621", "800.772", "800.923", "801.074", 
  "801.193", "84.332", "BaseFor."), class = "factor"), V11 = structure(c     (85L, 86L, 87L, 88L, 89L, 90L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
"0", "0.003903", "0.009903", "0.015903", "0.021903", "0.027903", 
 "0.033903", "0.039903", "0.045903", "0.051903", "0.057903", "0.063903", 
 "0.069903", "0.075903", "0.077429", "0.08433", "0.093127", "0.101114", 
 "0.108712", "0.11453", "0.12053", "0.124929", "0.130929", "0.136267", 
 "0.142267", "0.152885", "0.158885", "0.164885", "0.170885", "0.180633", 
 "0.190768", "0.196768", "0.202768", "0.208768", "0.214768", "0.22325", 
 "0.231018", "0.240961", "0.247414", "0.253414", "0.262807", "0.264757", 
 "0.270757", "0.276757", "0.284065", "0.29092", "0.293955", "0.296581", 
 "0.303881", "0.309881", "0.317746", "0.323746", "0.329746", "0.335746", 
 "0.341746", "0.347746", "0.353746", "0.359746", "0.365746", "0.371746", 
 "0.377746", "0.383746", "0.389746", "0.401176", "0.407176", "0.413936", 
 "0.421828", "0.427828", "0.433828", "0.439828", "0.445828", "0.451828", 
 "0.457828", "0.463828", "0.469828", "0.478943", "0.485564", "0.491564", 
  "0.497564", "0.503564", "0.509564", "0.515564", "0.521564", "0.527564", 
 "0.538766", "0.544766", "0.550766", "0.556766", "0.562766", "0.568766", 
 "0.574766", "0.580766", "0.586766", "0.592766", "0.597903", "Displ.", 
 "M15 (10-INF)"), class = "factor"), V12 = structure(c(64L, 63L, 
  62L, 61L, 60L, 59L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "0", 
 "1005.726", "1009.623", "1011.811", "1017.902", "1025.83", "1031.746", 
 "1038.527", "1039.66", "1042.112", "1056.988", "1067.679", "1071.904", 
 "1081.668", "1084.051", "1096.224", "1097.858", "1106.559", "1118.378", 
 "1125.618", "1135", "1140.472", "1141.291", "1148.964", "1156.559", 
 "1166.651", "1176.709", "1186.523", "1198.38", "1202.793", "1217.696", 
 "1226.19", "1234.685", "1240.749", "1242.795", "1256.85", "1268.252", 
 "1269.925", "1272.089", "1275.215", "1275.357", "1276.389", "166.25", 
 "254.359", "343.708", "433.057", "522.87", "612.683", "702.496", 
 "79.716", "792.309", "858.234", "859.779", "861.582", "863.381", 
 "865.178", "866.972", "868.763", "870.552", "872.337", "874.12", 
 "875.901", "878.915", "880.338", "881.758", "882.122", "883.176", 
 "884.591", "886.003", "887.412", "888.819", "889.813", "890.896", 
 "891.464", "893.109", "895.73", "899.729", "903.725", "907.718", 
 "911.709", "915.696", "921.024", "926.016", "932.761", "944.564", 
 "949.074", "950.715", "956.855", "962.992", "969.127", "975.258", 
"981.357", "987.454", "993.547", "999.638", "BaseFor."), class = "factor"), 
V15 = structure(c(85L, 86L, 87L, 88L, 89L, 90L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("", "0", "0.000278", "0.005722", 
"0.011722", "0.017722", "0.023722", "0.029722", "0.035722", 
"0.041722", "0.047722", "0.053722", "0.059722", "0.065722", 
"0.071722", "0.077722", "0.083722", "0.089722", "0.095722", 
"0.101722", "0.107722", "0.113722", "0.117013", "0.123013", 
"0.129013", "0.138671", "0.14632", "0.156907", "0.163297", 
"0.165095", "0.171095", "0.181276", "0.185661", "0.191661", 
"0.197661", "0.20741", "0.219165", "0.227842", "0.233842", 
"0.239842", "0.245842", "0.251842", "0.257842", "0.265518", 
"0.277034", "0.287175", "0.293925", "0.298905", "0.304905", 
"0.310905", "0.316905", "0.319905", "0.327", "0.337938", 
"0.345053", "0.353392", "0.359392", "0.365392", "0.373443", 
"0.381492", "0.390686", "0.398531", "0.406132", "0.412132", 
"0.418132", "0.424132", "0.430132", "0.436132", "0.442132", 
"0.450659", "0.456659", "0.462659", "0.468659", "0.477793", 
"0.483793", "0.489793", "0.495793", "0.501793", "0.507793", 
"0.513793", "0.519793", "0.525793", "0.531793", "0.537793", 
"0.543793", "0.549793", "0.555793", "0.561793", "0.567793", 
"0.573793", "0.579793", "0.585793", "0.591793", "0.593722", 
"Displ.", "M17 (10-INF_M)"), class = "factor"), V16 = structure(c(66L, 
65L, 64L, 63L, 62L, 61L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", 
"0", "1001.042", "1007.585", "1013.736", "1018.478", "1022.144", 
"1030.544", "1043.215", "1054.922", "1055.09", "1073.135", 
"1088.127", "1092.718", "1101.899", "1107.55", "1112.331", 
"1122.695", "1127.945", "1135.753", "1145.092", "1147.475", 
"1161.206", "1173.141", "1183.647", "1189.412", "1194.152", 
"1204.658", "1212.448", "1214.9", "1218.199", "1224.255", 
"1229.838", "1235.349", "1245.109", "1247.205", "1248.478", 
"1251.639", "1251.741", "133.508", "182.716", "232.96", "283.203", 
"333.447", "383.69", "39.235", "433.934", "484.177", "534.421", 
"584.777", "635.134", "685.49", "735.847", "785.196", "81.948", 
"831.454", "849.509", "850.124", "852.032", "854.335", "856.635", 
"858.931", "861.223", "863.514", "866.744", "870.24", "873.733", 
"875.898", "877.221", "881.135", "885.044", "888.948", "893.188", 
"895.326", "900.137", "905.603", "911.296", "916.983", "918.404", 
"926.137", "932.075", "938.009", "943.939", "951.689", "956.848", 
"958.382", "962.005", "967.053", "972.099", "977.142", "980.212", 
"981.722", "986.714", "993.275", "BaseFor."), class                = "factor")),    .Names  =     c("V1", 
 "V2", "V5", "V6", "V11", "V12", "V15", "V16"), row.names = c(3L, 
 4L, 5L, 6L, 7L, 8L, 12L, 13L, 14L, 15L, 16L, 17L), class = "data.frame")
user249018
  • 505
  • 2
  • 5
  • 18
  • You have factor variables. What sort of curves do you imagine being produced? – IRTFM Mar 25 '18 at 01:00
  • You need to clean up your data first. Put only column headers in the 1st line instead of 3 lines as of now – Tung Mar 25 '18 at 01:00
  • 1
    Thanks. I dont understand. The data is not a data set. I also edited the column headers now, but could not go further. – user249018 Mar 25 '18 at 01:33
  • @user249018: that looks better now. What is your expected output? Can you plot it in Excel and add the picture to your question? – Tung Mar 25 '18 at 01:56
  • I just included a part of the curve plotting V2 as a function of V1. Including the other curves would have complicated the view. I still hope for some hints. – user249018 Mar 25 '18 at 04:56
  • Do the curves need to be on the same graph or different ? – Jack Brookes Mar 25 '18 at 11:01
  • Yes. They need to be all 4 in the same graph, since I need to consider and explore their close relationship. – user249018 Mar 25 '18 at 13:17

2 Answers2

1

Here's the dataset I think I'm supposed to be working with.

> D

        V1      V2       V5      V6      V11     V12      V15     V16
3 0.562965 803.438  0.58323 800.772 0.527564 878.915 0.543793  870.24
4 0.568965  803.44  0.58923 800.923 0.538766 875.901 0.549793 866.744
5 0.574965 803.441  0.59523 801.074 0.544766  874.12 0.555793 863.514
6 0.580965 803.443 0.599941 801.193 0.550766 872.337 0.561793 861.223
7 0.586965 803.444                  0.556766 870.552 0.567793 858.931
8 0.592965 803.322                  0.562766 868.763 0.573793 856.635

The structure you included was a mess - the values are stored as factors, not numerics. So here I tidy them up (annoyingly you have to convert to character, then to numeric). After that, I gathered the columns into a value and variable column.

library(tidyverse)

D_long <- D %>% 
  dplyr::mutate_all(as.character) %>% 
  dplyr::mutate_all(as.numeric) %>% 
  tidyr::gather(variable, value, V2:V16) %>% 
  dplyr::filter(!is.na(value))

D_long 

Output

         V1 variable      value
1  0.562965       V2 803.438000
2  0.568965       V2 803.440000
3  0.574965       V2 803.441000
4  0.580965       V2 803.443000
5  0.586965       V2 803.444000
6  0.592965       V2 803.322000
7  0.562965       V5   0.583230
8  0.568965       V5   0.589230
9  0.574965       V5   0.595230
10 0.580965       V5   0.599941
11 0.562965       V6 800.772000
12 0.568965       V6 800.923000
13 0.574965       V6 801.074000
14 0.580965       V6 801.193000
15 0.562965      V11   0.527564
16 0.568965      V11   0.538766
17 0.574965      V11   0.544766
18 0.580965      V11   0.550766
19 0.586965      V11   0.556766
20 0.592965      V11   0.562766
21 0.562965      V12 878.915000
22 0.568965      V12 875.901000
23 0.574965      V12 874.120000
24 0.580965      V12 872.337000
25 0.586965      V12 870.552000
26 0.592965      V12 868.763000
27 0.562965      V15   0.543793
28 0.568965      V15   0.549793
29 0.574965      V15   0.555793
30 0.580965      V15   0.561793
31 0.586965      V15   0.567793
32 0.592965      V15   0.573793
33 0.562965      V16 870.240000
34 0.568965      V16 866.744000
35 0.574965      V16 863.514000
36 0.580965      V16 861.223000
37 0.586965      V16 858.931000
38 0.592965      V16 856.63500

Then map the columns to aesthetics, and plot a line layer:

ggplot(D_long, aes(x = V1, y = value, color = variable)) +
  geom_line()

Output

enter image description here

Jack Brookes
  • 3,720
  • 2
  • 11
  • 22
  • Thanks so much. But I am not quite sure if this is what I was asking for. In total I was supposed to get four curves: (V1,V2),(V5,V6),(V11,V12), and (V15,V16), where the first coordinate is x and the second y. I guess this is a misunderstanding. Can you on the basis of this correct your results and tell me what to do ? I also would like to know, what went wrong with the data which I simply saved as csv file and then transferred into R using read.csv() commend. I actually didnt store them as factor. For each curve the x values are different, and for each curve the amount of data is different. – user249018 Mar 25 '18 at 13:16
1

Considering what you need, you should have arranged your data in your csv file like this

library(magrittr)
library(ggplot2)

D <- structure(list(X = c(0.562965, 0.568965, 0.574965, 0.580965, 
    0.586965, 0.592965, 0.58323, 0.58923, 0.59523, 0.599941, 0.527564, 
    0.538766, 0.544766, 0.550766, 0.556766, 0.562766, 0.543793, 0.549793, 
    0.555793, 0.561793, 0.567793, 0.573793), Y = c(803.438, 803.44, 
    803.441, 803.443, 803.444, 803.322, 800.772, 800.923, 801.074, 
    801.193, 878.915, 875.901, 874.12, 872.337, 870.552, 868.763, 
    870.24, 866.744, 863.514, 861.223, 858.931, 856.635), Group = c("V1_V2", 
    "V1_V2", "V1_V2", "V1_V2", "V1_V2", "V1_V2", "V5_V6", "V5_V6", 
    "V5_V6", "V5_V6", "V11_V12", "V11_V12", "V11_V12", "V11_V12", 
    "V11_V12", "V11_V12", "V15_V16", "V15_V16", "V15_V16", "V15_V16", 
    "V15_V16", "V15_V16")), .Names = c("X", "Y", "Group"), row.names = c(NA, 
    -22L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(
    cols = structure(list(X = structure(list(), class = c("collector_double", 
    "collector")), Y = structure(list(), class = c("collector_double", 
    "collector")), Group = structure(list(), class = c("collector_character", 
    "collector"))), .Names = c("X", "Y", "Group")), default = structure(list(), 
    class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"))
head(D)

#> # A tibble: 6 x 3
#>       X     Y Group
#>   <dbl> <dbl> <chr>
#> 1 0.563  803. V1_V2
#> 2 0.569  803. V1_V2
#> 3 0.575  803. V1_V2
#> 4 0.581  803. V1_V2
#> 5 0.587  803. V1_V2
#> 6 0.593  803. V1_V2

ggplot(D, aes(x = X, y = Y, color = Group, group = Group)) +
  geom_line()

# or
D %>% 
  ggplot(., aes(x = X, y = Y, color = Group, group = Group)) +
  geom_line()

Edit: to create the data frame D automatically from OP's original data
Credit to this answer

D1 <- structure(list(V1 = c(0.562965, 0.568965, 0.574965, 0.580965, 
        0.586965, 0.592965), V2 = c(803.438, 803.44, 803.441, 803.443, 
        803.444, 803.322), V5 = c(0.58323, 0.58923, 0.59523, 0.599941, 
        NA, NA), V6 = c(800.772, 800.923, 801.074, 801.193, NA, NA), 
            V11 = c(0.527564, 0.538766, 0.544766, 0.550766, 0.556766, 
            0.562766), V12 = c(878.915, 875.901, 874.12, 872.337, 870.552, 
            868.763), V15 = c(0.543793, 0.549793, 0.555793, 0.561793, 
            0.567793, 0.573793), V16 = c(870.24, 866.744, 863.514, 861.223, 
            858.931, 856.635)), .Names = c("V1", "V2", "V5", "V6", "V11", 
        "V12", "V15", "V16"), row.names = c(NA, -6L), class = c("tbl_df", 
        "tbl", "data.frame"), spec = structure(list(cols = structure(list(
            V1 = structure(list(), class = c("collector_double", "collector"
            )), V2 = structure(list(), class = c("collector_double", 
            "collector")), V5 = structure(list(), class = c("collector_double", 
            "collector")), V6 = structure(list(), class = c("collector_double", 
            "collector")), V11 = structure(list(), class = c("collector_double", 
            "collector")), V12 = structure(list(), class = c("collector_double", 
            "collector")), V15 = structure(list(), class = c("collector_double", 
            "collector")), V16 = structure(list(), class = c("collector_double", 
            "collector"))), .Names = c("V1", "V2", "V5", "V6", "V11", 
        "V12", "V15", "V16")), default = structure(list(), class = c("collector_guess", 
        "collector"))), .Names = c("cols", "default"), class = "col_spec"))

# make group names which are the combination of every 2 column names 
groupName <- paste0(names(D1)[c(TRUE, FALSE)], names(D1)[c(FALSE, TRUE)])
groupName
#> [1] "V1V2"   "V5V6"   "V11V12" "V15V16"

# next we split the data into a list of groups of 2 columns, 
# then change the names of the list with setNames and 
# rbind the list elements to a single data.table using rbindlist 
# and specifying the idcol as 'Group'
library(data.table)
lst <- split.default(D1, cumsum(rep(c(TRUE, FALSE), ncol(D1)/2)))
D <- rbindlist(setNames(lst, groupName), idcol = "Group")

D %>% 
  ggplot(., aes(x = V1, y = V2, color = Group, group = Group)) +
  xlab("X") + ylab("Y") +
  geom_line()

Other tip: use read_csv from readr package to read data into R as it has stringsAsFactors = FALSE by default and is much faster than base R read.csv. Read more about it here and here.

Created on 2018-03-25 by the reprex package (v0.2.0).

Tung
  • 26,371
  • 7
  • 91
  • 115
  • 1
    I am greatly thankful for your support you provided yestarday and today. Your suggestions and proposals work very well. I have to represent about 12 curves, for each of which I have about 100 data values. So, the structure of such a data frame (it will be kind of lengthy) is a little bit unusual for Excel, but since it worked so well it doesnt matter. Thank you also for the tips which sound great. – user249018 Mar 25 '18 at 21:26
  • @user249018: see my edit if you want to create the data frame `D` automatically – Tung Mar 26 '18 at 04:17
  • Sorry for bothering you again. I encountered a difficulty, which I somehow didnt observe when I replied to you after I tried the code. I noticed that the code you wrote is based on the premise of admitting the data frame called D1, which you show in the dput format. How do you get the D1 ? Actually the difference between the Initial "data set" D and the data frame D1 is that D1 is filling the missing entries with NA values in order to get the shape of a data frame. – user249018 Mar 27 '18 at 19:26
  • I didn't do anything actually. `read_csv` automatically filled missing cells with NA – Tung Mar 27 '18 at 19:33