0

I have a data frame that I want to subset by one of the column values, and then I want to run chi squared on each of the new subsets.

I read the question about Subsetting a data frame into multiple data frames based on multiple column values which showed me how to subset a data frame. I used a variant on the code suggested there:

split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)

That worked with my data, but what I then want to know is how to reuse those subsets so:

  • how do I run a function over each new subset?

The data I have looks like this:

          SPELLING VARS DATA SET    
   Headword   Variant   Freq1   Freq2
    Knight      Kniht     17      22 
    Knight      Knyhht    28      12 
    Knight      Knyt       6       7
    Sword       Sword      7       8
    Sword       Swerd     14      44

So I'd like a subset for Sword, and one for Knight, and I'd like to run chi squared over each subset. But I'm not sure how to do it.

I've tried to do this myself, but with no success. The code I've been attempting to use is a variant on the answer to the Subsetting question I linked to above:

chisq.test(split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE))

However, this gives the error (list) object cannot be coerced to type 'double'. I'm at a bit of a loss and I'd appreciate any advice!

Community
  • 1
  • 1
Rose
  • 137
  • 2
  • 10
  • 2
    use `lapply` on the list of dataframes. https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html – Wietze314 Sep 29 '16 at 09:05
  • I think the package `dplyr` will help. If you make an example dataset, I'll do an answer showing how. – Dan Lewer Sep 29 '16 at 10:50

1 Answers1

1

use lapply to do a function over a list of dataframes:

SpellingVars <- data.frame(Headword= c('Knight','Knight','Knight','Sword','Sword')
           ,Variant= c('Kniht', 'Knyhht', 'Knyt', 'Sword', 'Swerd')
           ,Freq1 = c(17,28,6,7,14)
           ,Freq2 = c(22,12,7,8,44))


sp <- split(SpellingVars, with(SpellingVars, interaction(Headword)), drop = TRUE)

lapply(sp, function(x){chisq.test(x$Freq1, x$Freq2)})
Wietze314
  • 5,942
  • 2
  • 21
  • 40