0

I have the following dataframe named "dataset"

> dataset
   V1 V2 V3 V4 V5 V6   V7
1   A 29 27  0 14 21  163
2   W 70 40 93 63 44 1837
3   E 11  1 11 49 17  315
4   S 20 59 36 23 14  621
5   C 12  7 48 24 25  706
6   B 14  8 78 27 17  375
7   G 12  7  8  4  4  257
8   T  0  0  0  0  0    0
9   N 32  6  9 14 17  264
10  R 28 46 49 55 38  608
11  O 12  2  8 12 11  450

I have two helper functions as below

get_A <- function(p){  
     return(data.frame(Scorecard = p, 
                       Results = dataset[nrow(dataset),(p+1)]))
 }  #Pulls the value from the last row for a given value of (p and offset by  1)

get_P <- function(p){
     return(data.frame(Scorecard= p, 
                       Results = dataset[p,ncol(dataset)]))
} #Pulls the value from the last column for a given value of p

I have the following dataframe on which I need to run the above helper functions. There will be NAs because I'm reading this "data_sub" dataframe from an excel file which can have unequal rows for the two columns.

> data_sub
      Key_P     Key_A
1         2         1
2         3         3
3         4         5
4        NA        NA

When I call the helper functions, I get some strange results as shown below:

> get_P(data_sub[complete.cases(data_sub$Key_P),]$Key_P)
  Scorecard Results
1         2    1837
2         3     315
3         4     621

> get_A(data_sub[complete.cases(data_sub$Key_A),]$Key_A)
  Scorecard Results.V2 Results.V4 Results.V6
1         1         12          8         11
2         3         12          8         11
3         5         12          8         11
Warning message:
In data.frame(Scorecard = p, Results = dataset[nrow(dataset), (p +  :
  row names were found from a short variable and have been discarded

The call to the get_P() helper function is working the way I want. I'm getting the "Results" for each non-NA value in data_sub$Key_P as a dataframe.

But the call to the get_A() helper function is giving strange results and also a warning.I was expecting it to give a similar dataframe as given the call to get_P(). Why is this happening and how can I make get_A() to give the correct dataframe? Basically, the output of this should be

  Scorecard Results
1         1      12
2         3       8
3         5      11

I found this link related to the warning but it's unhelpful in solving my issue.

Sujith
  • 15
  • 5

1 Answers1

0

The following works

get_P <- function(df, data_sub) {
    data_sub <- data_sub[complete.cases(data_sub), ]
    data.frame(
        Scorecard = data_sub$Key_P,
        Results = df[data_sub$Key_P, ncol(df)])
}
get_P(df, data_sub)
#  Scorecard Results
#1         2    1837
#2         3     315
#3         4     621

get_A <- function(df, data_sub) {
    data_sub <- data_sub[complete.cases(data_sub), ];
    data.frame(
        Scorecard = data_sub$Key_A,
        Results = as.numeric(df[nrow(df), data_sub$Key_A + 1]))
}
get_A(df, data_sub)
#  Scorecard Results
#1         1      12
#2         3       8
#3         5      11

To avoid the warning, we need to strip rownames with as.numeric in get_A.

Another tip: It's better coding practice to make get_P and get_A a function of both df and data_sub to avoid global variables.


Sample data

df <- read.table(text =
    "   V1 V2 V3 V4 V5 V6   V7
1   A 29 27  0 14 21  163
2   W 70 40 93 63 44 1837
3   E 11  1 11 49 17  315
4   S 20 59 36 23 14  621
5   C 12  7 48 24 25  706
6   B 14  8 78 27 17  375
7   G 12  7  8  4  4  257
8   T  0  0  0  0  0    0
9   N 32  6  9 14 17  264
10  R 28 46 49 55 38  608
11  O 12  2  8 12 11  450", header = T, row.names = 1)


data_sub <- read.table(text =
    "      Key_P     Key_A
1         2         1
2         3         3
3         4         5
4        NA        NA", header = T, row.names = 1)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • That works! I went ahead and fixed some other functions in my code to align with your coding tip. Trying to understand this a bit further, could you please throw some light on why my method wasn't working? And also, why did we not get a warning in `get_P` even though we did not use `as.numeric`? – Sujith Jul 10 '18 at 14:09
  • @Sujith In `get_P` we are getting a *column vector*, whereas in `get_A` we are getting a `row vector` (to be precise, a `1x3` `data.frame`). So we need to convert the row vector (`data.frame`) into a numeric (column) vector with `as.numeric.` – Maurits Evers Jul 12 '18 at 01:14