1

I run the following R script on this dataset: http://pastebin.com/HA42b8QV

require(ggplot2)
data <- read.table("funcExp.txt", sep = "\t", header = TRUE)
data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)
ggplot(data, aes(n, insTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decTime <- strtoi(data$decTime)
ggplot(data, aes(n, decTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delTime <- strtoi(data$delTime)
ggplot(data, aes(n, delTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insComp <- strtoi(data$insComp)
ggplot(data, aes(n, insComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")


data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decComp <- strtoi(data$decComp)
ggplot(data, aes(n, decComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delComp <- strtoi(data$delComp)
ggplot(data, aes(n, delComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

and I get the following warnings:

Loading required package: ggplot2
Loading required package: methods
Warning messages:
1: Removed 26 rows containing missing values (stat_summary). 
2: Removed 26 rows containing missing values (geom_point). 
Warning messages:
1: Removed 30 rows containing missing values (stat_summary). 
2: Removed 30 rows containing missing values (geom_point). 
Warning messages:
1: Removed 22 rows containing missing values (stat_summary). 
2: Removed 22 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 25 rows containing missing values (stat_summary). 
2: Removed 25 rows containing missing values (geom_point). 

I searched online trying to figure out the reason however I couldn't. Most posts suggest there are null values in my dataset. Nothing is missing from my dataset, so I can't see why R would simply assume that some stuff is actually missing.

thank you

jsguy
  • 2,069
  • 1
  • 25
  • 36
  • Are they `Inf`? `is.na(Inf)` will evaluate to false, so if you're checking for missing values with that function then you'll miss infinites. – TayTay Oct 09 '15 at 19:11
  • Did you check if r read all your data correctly? – RHA Oct 09 '15 at 19:14

1 Answers1

3

It seems that while you are modifying your initial data, you are messing it up.

if you do not write

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)

then the plots work out nicely.

see, the structure of the data already tells you that everything is fine:

 > str(data)
 data.frame':   60 obs. of  8 variables:
 $ alg    : Factor w/ 3 levels "aheap","fibheap",..: 1 3 2 1 3 2 1 3 2 1 ...
 $ n      : int  2 2 2 4 4 4 8 8 8 16 ...
 $ insTime: num  408 867 1332 400 1031 ...
 $ decTime: num  359 738 1079 411 856 ...
 $ delTime: num  325 750 1242 416 931 ...
 $ insComp: num  0.9 1.5 2.5 1.9 3.5 6.5 5.8 11.6 18.6 12 ...
 $ decComp: num  0.5 1.1 5.1 1.7 3.6 11.6 3 7 23 11.6 ...
 $ delComp: num  0 0 1 3.6 7.6 14.8 16.8 38 67.6 57 ...

and your summary does not show any NAs:

 > summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :      400   Min.   :     359   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:     4518   1st Qu.:    3262   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :   110041   Median :   67643   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   :  8304522   Mean   : 5866098   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.:  2416198   3rd Qu.: 1556492   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000  

after using strtoi you create NAs !

> data$decTime <- strtoi(data$decTime)
> summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :     2175   Min.   :     498   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:   222651   1st Qu.:  264344   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :  1545575   Median : 1596015   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   : 14642987   Mean   :11713536   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.: 10317432   3rd Qu.: 9105678   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
                                NA's   :26          NA's   :30                                              
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000 

Hope that helps?

Jens
  • 2,363
  • 3
  • 28
  • 44
  • I wonder what it is about `strtoi` that causes NAs to propagate? Does it not like the 'e' in scientific notation? – TayTay Oct 09 '15 at 19:22
  • Just tested it out: `> strtoi('1e12')` `[1] NA` `> as.numeric('1e12')` `[1] 1e+12` – TayTay Oct 09 '15 at 19:24
  • As the function try to produce integers, all double or floating point numbers will be turned into an NA value, as they are neither strings nor integers. if you just look at the data, the first one is 408.5 this will be NA – Jens Oct 09 '15 at 19:27