-1

It might seem a silly question but how to repeat this line for 152 times and I would not like to use a for loop,since later it will not be efficient with larger data sets:

reviews = as.vector(t(mydata)[,1])

mydata is a row in a data.frame and reviews is an array of characters, also [,1] is just the first row

The output could be a matrix or worst case a data.frame.

I tried something like this, but it did not work :

testing = apply(mydata, 1, function(x) {as.vector(t(mydata[,x]))})
Error in t(mydata)[, x] : subscript out of bounds

Thanks.

EDIT: Quick data sample:

> reviews = as.vector(t(mydata)[,1])
> class(reviews)
[1] "character"
> length(reviews)
[1] 14
> reviews
[1] "I was involuntarily"                                                                   
[2] "I was in transit"                                                                                   
[3] "My initial flight"                                                                             
[4] "That still left"                                                                                           
[5] "After disembarking"                                                                    
[6] "customs and proceed to my gate."                                                                                                                                                                        
[7] "I arrived"                                                                                                                                     
[8] "When my boarding pass was scanned"                         
[9] "No reason was given for the bump."                                                                                                                                                                      
[10] "The UA gate staff"
[11] "I boarded Air Canada."                                                                                                                    
[12] "After arriving"                                                                                  
[13] "I spent 5 hours"                                                                                                           
[14] NA      

mydata data.frame:

> class(mydata)
 [1] "data.frame"
 > length(mydata[,1])
 [1] 152
 > mydata[,1]
 [1] I was involuntarily... .
 [2] First time... . 
 ...
 ...                                                                                     
 152 Levels: First time . ...

I have about 30.000 of these, but I want to start small, so only 152 of paragraphs split in individual sentence and put into a data.frame. Each row in the data.frame has 5-15 sentences.

I want to to be able to access each row as an array since I need to perform some action on each row of the data.frame

Packages used: plyr, sentiment(downloaded from here and installed manually)

EDIT 2:

 dput(myData[1:6, 1:6])
 structure(list(V1 = structure(c(70L, 41L, 94L, 114L, 47L, 49L), 
 .Label = c(" Air Canada", 
 "their service", 
 "hours for de-icing", 
 "have flown BA", 
 "my booking", 
 "If the video screen", 
 "Frankfurt flights", 
 "and another 150 lines of text data", 
Uther Pendragon
  • 302
  • 2
  • 14
  • 2
    You should supply a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and desired output data to make it clear what you are trying to do and to test any possible solutions. – MrFlick Jun 15 '15 at 05:20
  • @MrFlick I did my best to explain the situation, I hope that this could be at least a mediocre reproducible example – Uther Pendragon Jun 15 '15 at 05:30
  • 3
    @UtherPendragon, you are asking advice on how to use `apply` on a data.frame yet you have not provided a representative example. Since you claim it is rather large, ***make up some data*** that looks somewhat like you want (perhaps 2-3 words each sentence) without clobbering our screens with comments for an airline. The link @MrFlick provided has sections that specifically discuss **"producing a minimal dataset"** in the likely event the section **"copy your data"** is impractical in this question. – r2evans Jun 15 '15 at 06:13
  • @r2evans I am sorry I could not provide what you asked for. I edited the question a lot. Please take a look again. – Uther Pendragon Jun 15 '15 at 06:38
  • I am getting down-voted because of bad formatting / structuring or because it is a bad question ? Is it understandable that I only want to repeat one line but not using a for loop, more likely an apply() statement ? – Uther Pendragon Jun 15 '15 at 07:22
  • by bad structuring/ formatting I meant not giving a proper reproducible example – Uther Pendragon Jun 15 '15 at 07:32
  • I will add `dput(mydata[1:6, 1:6])` in a minute – Uther Pendragon Jun 15 '15 at 07:41
  • I added output for `dput(mydata[1:6, 1:6])` – Uther Pendragon Jun 15 '15 at 07:54
  • 1
    Thank you for finally adding data ... unfortunately, it's incomplete. Since you don't (yet) have the ["Informed"](http://stackoverflow.com/help/badges/2600/informed) badge, your recommended reading (that started with @MrFlick's [suggestion](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)) now includes the StackOverflow [Tour](http://stackoverflow.com/tour). There are several other good sections immediately after the tour, such as [how to create a minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve). – r2evans Jun 15 '15 at 08:09

1 Answers1

1

Here's a recommended way to ask a question, focusing on the fact that your actual data is too big, too complicated, or too private to share.

Question: how to apply a function on each row of a data.frame?

My data:

# make up some data
s <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
mydata <- as.data.frame(matrix(strsplit(s, '\\s')[[1]][1:18], nrow=3, ncol=6), stringsAsFactors=FALSE)
mydata
##      V1          V2         V3      V4         V5     V6
## 1 Lorem         sit adipiscing      do incididunt     et
## 2 ipsum       amet,      elit, eiusmod         ut dolore
## 3 dolor consectetur        sed  tempor     labore  magna

If you have data that you can use directly, then as has been suggested multiple times in the comments, the use of dput is helpful:

mydata <- structure(list(V1 = c("Lorem", "ipsum", "dolor"),V2 = c("sit", "amet,", "consectetur"), V3 = c("adipiscing", "elit,", "sed"), 
    V4 = c("do", "eiusmod", "tempor"), V5 = c("incididunt", "ut", "labore"), V6 = c("et", "dolore", "magna")), .Names = c("V1", 
    "V2", "V3", "V4", "V5", "V6"), row.names = c(NA, -3L), class = "data.frame")

In either order, state (i) what you are trying to do, and (ii) what you have tried and how it is not working.

My desired output:

Converting a row into a vector is ... confusing. A row is already a vector, so I don't know what you are ultimately trying to do. So, I'll come up with something short an to the point: I want the words on each row to be in reverse alphabetical order, perhaps like this:

##       V1    V2         V3      V4     V5          V6
## 1    sit Lorem incididunt      et     do  adipiscing
## 2     ut ipsum      elit, eiusmod dolore       amet,
## 3 tempor   sed      magna  labore  dolor consectetur

This is a good time to show the code you've tried, errors you've encountered, and/or how the unerring output is not what you intended.

Answer, generically:

Several ways to do something to each row:

  1. Use apply, though this breaks if you have numeric and character intermingled. If you try this, you'll see that the output is actually the transpose of what you may think, in which case you'll need to wrap (and all of the other *apply-based suggestions here) with t(...). It's a little confusing, but it's necessary here. Oh, and they'll all be a matrix class which can easily be converted to data.frame if needed.

    ret <- apply(mydata, 1, function(r) {
        do_something(r)
    })
    
  2. Use sapply or lapply on row indices. Note that these are returning lists or vectors of results, so you'll need to convert into whatever format you ultimately need.

    ret <- sapply(1:nrow(mydata), function(i) {
        do_something(mydata[i,])
    })
    
    # if you need to keep each row's results rather encapsulated, use one of the following:
    ret <- sapply(1:nrow(mydata), function(i) {
        do_something(mydata[i,])
    }, simplify=FALSE)
    
    ret <- lapply(1:nrow(mydata), function(i) {
        do_something(mydata[i,])
    })
    
  3. Use foreach and iterators.

    library(foreach)
    library(iterators)
    ret <- foreach(df=iter(mydata, by='row'), .combine=rbind) %do% {
        do_something(df) # just one row of mydata this time
    }
    

In the case of my (contrived) question, here are several ways to do it:

as.data.frame(t(apply(mydata, 1, function(r) sort(r, decreasing=TRUE))))
##       V1    V2         V3      V4     V5          V6
## 1    sit Lorem incididunt      et     do  adipiscing
## 2     ut ipsum      elit, eiusmod dolore       amet,
## 3 tempor   sed      magna  labore  dolor consectetur

as.data.frame(t(sapply(1:nrow(mydata), function(i) sort(mydata[i,], decreasing=TRUE))))
## same output

library(foreach)
library(iterators)
## notice the use of as.character(...), perhaps still a blasphemy
## to the structure of a data.frame
ret <- foreach(df=iter(mydata, by='row'), .combine=rbind) %do% {
    sort(as.character(df), decreasing=TRUE)
}
ret
##          [,1]     [,2]    [,3]         [,4]      [,5]     [,6]         
## result.1 "sit"    "Lorem" "incididunt" "et"      "do"     "adipiscing" 
## result.2 "ut"     "ipsum" "elit,"      "eiusmod" "dolore" "amet,"      
## result.3 "tempor" "sed"   "magna"      "labore"  "dolor"  "consectetur"
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • `ret <- sapply(1:nrow(mydata), function(i) { do_something(mydata[i,]) })` ... this is what I need it. Thank you very much, I appreciate it – Uther Pendragon Jun 15 '15 at 08:06