0

I want to write a function that will take two categorical variables and create and display mosaic plot, table of counts, table of row percentages and then finally display a chi-square test. Suppose I have data regarding a persons marital status(married/unmarried) and if they smoke (yes/no) or not. I want it to create a mosaic plot of those two variables, then show me the counts/row percentages of those two variables and finally carry out a chi-square test.

I have attempted the following:

Fnct <- function(x, y) {
    # Will create mosaic plot, but the labels show up incorrectly
    plot <- mosaicplot(~x + y, color=TRUE, main = "Mosaic Plot", xlab = x, ylab = y)

    #creates a 2by2 table and stores it in my table
    mytable <- table(x, y)
    mytable2 <- prop.table(mytable, 1)

    chi <- chisq.test(mytable)

    return(c(plot, mytable2, chi))
}

Fnct(Data$Marital, Data$Smoker)

When I output the data, it does output the mosaic plot, but the labels are incorrect. They repeatedly repeat the levels of the treatment, but not just the column name. It does not output the counts or chi-square tests properly either. What am I doing incorrectly?

alistaire
  • 42,459
  • 4
  • 77
  • 117
  • 2
    We would be happy to help once you add a [minimal reproducible example](http://stackoverflow.com/a/5963610/2437479). – shrgm May 05 '16 at 17:19
  • This is a messy function. Functions should really return one thing, either as a side-effect (e.g. plotting) or as an object. Returning a list of different types of objects will usually be a pain, as you'll have to pull them apart to do anything useful with them. Better to make more than one function unless you're effectively making a whole class. – alistaire May 05 '16 at 17:33
  • @alistaire I would mostly agree with that, but I think if you're performing some QnD exploration on a dataset, a convenience function like what the OP is trying to create could return very useful results. I'm not sure what the OP's purpose is, but I've used fxns like these when I'm just starting to get my hands dirty w/ data (which is what I thought OP's use-case was). – bouncyball May 05 '16 at 17:44

1 Answers1

2

You shouldn't make it return a vector. Instead, just have it return a list like so:

Fnct <- function(x,y) {

    #just plot it, don't return it
    mosaicplot(~x + y,color=TRUE,main = "Mosaic Plot", 
    xlab = substitute(x), 
    ylab = substitute(y))

    mytable2 <-prop.table(table(x, y), 1)

    chi <- chisq.test(table(x,y))

    #return a named list
    return(list(
                'Row Percentages' = mytable2,
                'Chi-squared test' = chi))
}

And then you'll want to call it in the following fashion for the labels to show up properly:

with(mydata, Fnct(x, y))

Here's how it would work:

set.seed(1)
df <- data.frame(A = sample(c('a','b'), 100, replace = T),
                 B = sample(c('foo','bar','haz'), 100, replace = T))
with(df, Fnct(A,B))

$`Row Percentages`
   y
x         bar       foo       haz
  a 0.5000000 0.2307692 0.2692308
  b 0.2916667 0.3541667 0.3541667

$`Chi-Squared Test`

    Pearson's Chi-squared test

data:  table(x, y)
X-squared = 4.5998, df = 2, p-value = 0.1003

enter image description here

bouncyball
  • 10,631
  • 19
  • 31