0

I have been searching/thinking of a way in which I can extract the nth value (e.g. 2nd, 5th, 7th, etc.) from each row in my data frame.

For example, I have the following columns:

ID   Q1-2013   Q2-2013   Q3-2013  Q4-2013  Q1-2014   Q2-2014   Q3-2014  Q4-2014

Under each column there are given values. What I would like to do is pull the nth value of each row from the quarters vector (2nd-8th columns). So for example, if I am looking for the 2nd value from each row, the formula/function I want would extract/pull the 2nd value from each row from columns 2-8 (Q1-2013 to Q4-2014). In addition, the formula/function would ignore the blanks/NA values in each row as well.

alexwlchan
  • 5,699
  • 7
  • 38
  • 49
  • 4
    We don't want images anyway, you can copy-paste a small sample of the raw data – Rich Scriven Dec 16 '14 at 18:41
  • 2,000 5,000 2,500 If the three values were in one row, I would want to pull the 2nd value. – Chintan Desai Dec 16 '14 at 19:09
  • Looks like the first problem will be data-input. You've got spaces and dollar signs in the column headers and commas in the values. Since this is typical of Excel, you may want to see if you can clean this up on that end before building a CSV file. – IRTFM Dec 16 '14 at 19:17
  • Well I have removed the commas from the data frame (I just put the commas manually in here to make it readable). That being said, the column headers should not really impact the formula. I can simple use a vector, e.g. [2:8] to pull the columns. Now i just want to pull the 2nd value in each row of those columns. – Chintan Desai Dec 16 '14 at 19:19
  • 1
    And .... if you already have an R dataframe, you should give us the output of `dput(head(dfrm_name))` – IRTFM Dec 16 '14 at 19:20
  • The output using that formula is incredibly long. I personally don't think that would be very useful. If we can work with the given information that would be great. The comments below regarding "2nd non empty value" in each row (observation) across 8 columns(variables) would be awesome. Thanks! – Chintan Desai Dec 16 '14 at 19:35

2 Answers2

3

Maybe this is what you're after.

I first modified the iris data set with some NAs in each column:

iris[] <- lapply(iris, function(x){ x[sample(150, 30, F)] <- NA; x})
head(iris)
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4          NA  setosa
#2           NA          NA          1.4          NA  setosa
#3           NA          NA          1.3         0.2  setosa
#4          4.6         3.1          1.5          NA  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4          NA          1.7         0.4  setosa

Then, to extract the second non-empty and non-NA entries per row you could use apply (I know, it's not recommended on data frames, but it does the dirty job):

apply(iris, 1, function(x) x[which(!is.na(x) & x != "")[2]])
#  [1] "3.5"       "setosa"    "0.2"       "3.1"       "3.6"       "1.7"       "3.4"       "3.4"       "2.9"       "3.1"       "setosa"   
 #[12] "3.4"       "1.4"       "1.1"       "1.2"       "4.4"       "3.9"       "3.5"       "3.8"       "3.8"       "0.2"       "3.7"      
 #[23] "3.6"       "1.7"       "1.9"       "3.0"       "3.4"       "1.5"       "3.4"       "3.2"       "3.1"       "3.4"       "4.1"      
 #[34] "4.2"       "3.1"       "3.2"       "3.5"       "3.6"       "setosa"    "1.5"       "1.3"       "2.3"       "1.3"       "0.6"      
 #[45] "0.4"       "3.0"       "3.8"       "3.2"       "3.7"       "3.3"       "3.2"       "3.2"       "1.5"       "2.3"       "2.8"      
 #[56] "2.8"       "3.3"       "2.4"       "4.6"       "1.4"       "2.0"       "3.0"       "1.0"       "2.9"       "2.9"       "3.1"      
 #[67] "3.0"       "2.7"       "4.5"       "3.9"       "3.2"       "4.0"       "2.5"       "4.7"       "4.3"       "3.0"       "2.8"      
 #[78] "5.0"       "2.9"       "3.5"       "3.8"       "2.4"       "2.7"       "2.7"       "3.0"       "3.4"       "3.1"       "1.3"      
 #[89] "4.1"       "1.3"       "2.6"       "3.0"       "2.6"       "2.3"       "4.2"       "3.0"       "2.9"       "2.9"       "2.5"      
#[100] "2.8"       "3.3"       "2.7"       "3.0"       "2.9"       "3.0"       "3.0"       "4.5"       "2.9"       "5.8"       "3.6"      
#[111] "3.2"       "1.9"       "5.5"       "2.0"       "5.1"       "3.2"       "5.5"       "3.8"       "virginica" "1.5"       "3.2"      
#[122] "2.8"       "2.8"       "2.7"       "2.1"       "6.0"       "2.8"       "3.0"       "2.8"       "5.8"       "2.8"       "3.8"      
#[133] "5.6"       "1.5"       "2.6"       "3.0"       "5.6"       "5.5"       "4.8"       "3.1"       "5.6"       "5.1"       "2.7"      
#[144] "3.2"       "3.3"       "3.0"       "2.5"       "5.2"       "5.4"       "3.0"      

Because apply will first convert the data frame to a matrix, all columns are covnerted to the same type which is character in this case. You can later on convert it to whatever you want (but note that you cant convert the output vector in this case directly back to numeric since it contains some character strings such as "setosa" etc).

talat
  • 68,970
  • 21
  • 126
  • 157
  • Hi. I am a beginner in R, so some of this does not make sense to me. In the first part of your response, what does "lapply(iris, function(x){ x[sample(150, 30, F)] <- NA; x})" really do? And then in your second response, you use "apply" with "function(x)". What is function(x), and more importantly what is x? Thanks! – Chintan Desai Dec 16 '14 at 20:15
  • The first part with `lapply` is just a way to add some NAs to each column. It's only to prepare some data for the second step. `apply` in the actual answer is a way to do loops in R and since I used `apply(.., 1, ...)` it will loop over each row of the input. Explaining all the rest is beyond the scope of this answer & comment, so I suggest you take a look [here](http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega/7141669#7141669) and typing `?apply` to read the help page. Also look at one of the R intro tutorials. – talat Dec 16 '14 at 20:21
  • Got it. Thank you so much! The formula did work and I got what I needed. Appreciate it! – Chintan Desai Dec 17 '14 at 17:11
0

You could also use a convenient function naLast from library(SOfun)

library(SOfun)
dat[dat==''] <- NA #convert all `blank` cells to `NA`
n <- 2 # the row/column index that needs to be extracted
naLast(dat, by='col')[n,] #get the 2nd non-empty/nonNA element for each columns
#V1  V2  V3  V4  V5 
#"G" "B" "B" "B" "C" 

which would be the same with apply

 apply(dat, 2, function(x) x[which(!is.na(x) & x!='')[2]])
 #V1  V2  V3  V4  V5 
 #"G" "B" "B" "B" "C" 

You could also specify by='row'

naLast(dat, by='row')[,n] #get the 2nd non-empty/nonNA element for each row
#  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
#"G" "D" "B" "G" "E" "B" "J" "F" "F" "A" "H" "C" "A" "D" "H" "D" "J" "C" "A" "A" 

data

set.seed(25)
dat <- as.data.frame(matrix(sample(c(NA,'',LETTERS[1:10]), 
        20*5, replace=TRUE), ncol=5), stringsAsFactors=FALSE)

You can install the package by

 library(devtools)
 install_github("mrdwab/SOfun")
akrun
  • 874,273
  • 37
  • 540
  • 662