I've been limping my way around r data for a few months now. Sorry if any of this seems basic. I've been finding all kinds of close problems and solutions, but somehow I can't seem to adapt them to my situation. Starting to wonder if it's something I should be trying to do at all, but I suppose it can't hurt to ask.
I have a data frame that has a single scalar variable, and multiple T/F (yes/no; 1/0, 1/2) variables. like this:
scal var1 var2 var3
25 0 1 0
21 0 1 1
14 1 1 0
30 1 0 1
I know I can make a boxplot which separates the scalar variable column into categories using "by" for a single variable, like so:
boxplot(df$scal~df$var1)
I also know that I can make box plots for multiple scalar variables at once. I'd like to combine the two somehow to make a boxplot which can plot the dependent variable of each "true" subset and "false" subset of each variable next to one another. In my world, one solution should look something like "boxplot(df$scal~df$var1, df$scal~df$var2, df$scal~df$var3)", but r data doesn't agree with me. something about not being able to force a datatype.
I could also write a rough loop to go through each of the variables and generate all the plots separately, but I'd like to compare them side-by-side.
I've also thought to rearrange the dataset such that the "true" and "false" sets are in different columns (using subset(df$var1, df$var1==1) etc.), then making multiple boxplots as described before. (though this is quite tedius)
var1t var1f var2t var2f var3t var3f
14 25 25 30 21 25
30 21 21 30 14
14
boxplot(df2$var1t, df2$var1f, df2$var2t, df2$var2f, df2$var3t, df2$var3f)
However, the different lengths(number of rows) of the columns is giving me fits when creating the new dataset. I know that I can make a dataset in another program (saved as .csv, .xls, etc.) then import it. The null values would remain intact, but I'd really rather not do this manually. As one might imagine, this becomes quite tedious and prone to errors on larger scales.
Help with either approach would be most welcome.