0

I am still a beginner in R programming world, please do not mind for basic question. I have data in a file as shown below.

grep "lcost" inflection_point.trc
AP: lcost=4.00, rcost=6.02
AP: lcost=74340.93, rcost=249.97
AP: lcost=37172.17, rcost=128.50
AP: lcost=18587.79, rcost=6.24
AP: lcost=9295.60, rcost=6.13
AP: lcost=4649.71, rcost=6.08
AP: lcost=2326.56, rcost=6.05
AP: lcost=1165.19, rcost=6.04
AP: lcost=584.30, rcost=6.03
AP: lcost=294.06, rcost=6.03
AP: lcost=148.94, rcost=6.02
.....

grep "inflection point at card" inflection_point.trc
AP: Costing Nested Loops Join for inflection point at card 1.35
AP: Costing Hash Join for inflection point at card 1.35
AP: Costing Nested Loops Join for inflection point at card 182361.04
AP: Costing Hash Join for inflection point at card 182361.04
AP: Costing Nested Loops Join for inflection point at card 91181.20
AP: Costing Hash Join for inflection point at card 91181.20
AP: Costing Nested Loops Join for inflection point at card 45591.27
AP: Costing Hash Join for inflection point at card 45591.27
AP: Costing Nested Loops Join for inflection point at card 22796.31
AP: Costing Hash Join for inflection point at card 22796.31
AP: Costing Nested Loops Join for inflection point at card 11398.83
AP: Costing Hash Join for inflection point at card 11398.83
.....

Requirement is plot line graph using R programming for lcost and rcost values, with x-axis values derived from "inflection points".

I tried creating dataframe by using grep but in vain, and also no idea how to load these values into dataframe and plot line graph for lcost and rcost along with x-axis values.

> dataframe <- grep ('lcost',readLines("inflection_point.trc"),value=TRUE)
 [1] "AP: lcost=4.00, rcost=6.02"       "AP: lcost=74340.93, rcost=249.97"
 [3] "AP: lcost=37172.17, rcost=128.50" "AP: lcost=18587.79, rcost=6.24"  
 [5] "AP: lcost=9295.60, rcost=6.13"    "AP: lcost=4649.71, rcost=6.08"   
 [7] "AP: lcost=2326.56, rcost=6.05"    "AP: lcost=1165.19, rcost=6.04"   
 [9] "AP: lcost=584.30, rcost=6.03"     "AP: lcost=294.06, rcost=6.03"    
[11] "AP: lcost=148.94, rcost=6.02"     "AP: lcost=75.97, rcost=6.02"     
[13] "AP: lcost=39.69, rcost=6.02"      "AP: lcost=21.75, rcost=6.02"     
[15] "AP: lcost=12.78, rcost=6.02"      "AP: lcost=7.89, rcost=6.02"      
[17] "AP: lcost=5.85, rcost=6.02"       "AP: lcost=7.08, rcost=6.02"      
[19] "AP: lcost=6.26, rcost=6.02"       "AP: lcost=6.26, rcost=6.02" 

Any help would be great for me to learn R

This is what I was able to come up with, Could anyone please help me in plotting line graph by using ggplot. Is there any easy way to compute the data when compared to the way I have derived? Is there a way to convert all the columns data type in Dataframe to convert into Double?

lines <- readLines("inflection_point.trc")
require(reshape2)
fd1 <- colsplit(string=gsub( "[A-z]+[[:punct:]]", "", grep("cost=[0-9]+", lines, value=TRUE)),pattern=",", names=c("HASH", "NESTED"))
fd1
       HASH NESTED
1      4.00   6.02
2  74340.93 249.97
3  37172.17 128.50
4  18587.79   6.24
5   9295.60   6.13
6   4649.71   6.08
7   2326.56   6.05
8   1165.19   6.04
9    584.30   6.03
10   294.06   6.03
11   148.94   6.02
12    75.97   6.02
13    39.69   6.02
14    21.75   6.02
15    12.78   6.02
16     7.89   6.02
17     5.85   6.02
18     7.08   6.02
19     6.26   6.02
20     6.26   6.02
fd2 <- data.frame(Card= unique(gsub( "([[:alpha:]]|\\s|:)", "", grep(".*inflection point at card", lines, value=TRUE))))
fd2
        Card
1       1.35
2  182361.04
3   91181.20
4   45591.27
5   22796.31
6   11398.83
7    5700.09
8    2850.72
9    1426.04
10    713.69
11    357.52
12    179.44
13     90.39
14     45.87
15     23.61
16     12.48
17      6.92
18      9.70
19      8.31
20      7.61

require(dplyr)
fd3 <- bind_cols(fd1,fd2)
fd3
Source: local data frame [20 x 3]

       HASH NESTED      Card
      (dbl)  (dbl)    (fctr)
1      4.00   6.02      1.35
2  74340.93 249.97 182361.04
3  37172.17 128.50  91181.20
4  18587.79   6.24  45591.27
5   9295.60   6.13  22796.31
6   4649.71   6.08  11398.83
7   2326.56   6.05   5700.09
8   1165.19   6.04   2850.72
9    584.30   6.03   1426.04
10   294.06   6.03    713.69
11   148.94   6.02    357.52
12    75.97   6.02    179.44
13    39.69   6.02     90.39
14    21.75   6.02     45.87
15    12.78   6.02     23.61
16     7.89   6.02     12.48
17     5.85   6.02      6.92
18     7.08   6.02      9.70
19     6.26   6.02      8.31
20     6.26   6.02      7.61
fd3 <- fd3[-1,]
fd3
Source: local data frame [19 x 3]

       HASH NESTED      Card
      (dbl)  (dbl)    (fctr)
1  74340.93 249.97 182361.04
2  37172.17 128.50  91181.20
3  18587.79   6.24  45591.27
4   9295.60   6.13  22796.31
5   4649.71   6.08  11398.83
6   2326.56   6.05   5700.09
7   1165.19   6.04   2850.72
8    584.30   6.03   1426.04
9    294.06   6.03    713.69
10   148.94   6.02    357.52
11    75.97   6.02    179.44
12    39.69   6.02     90.39
13    21.75   6.02     45.87
14    12.78   6.02     23.61
15     7.89   6.02     12.48
16     5.85   6.02      6.92
17     7.08   6.02      9.70
18     6.26   6.02      8.31
19     6.26   6.02      7.61

> is.data.frame(fd3)
[1] TRUE
Yasser
  • 3
  • 4
  • You might benefit from reading [ask] and [how to create a reproducible example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It makes it easier for others to help you. – Heroka Nov 18 '15 at 12:22
  • This is what I was able to come up with, Could you please help me plotting line graph by using ggplot function. Also is there way to convert all the columns data type to double? – Yasser Nov 19 '15 at 12:26

0 Answers0