I have a question about coding interaction effects using dummy coding which I’d be really grateful for your advice on please.
Imagine I want to design an experiment to measure the impact of amount of food eaten in grams (continuous variable) on happiness scores (continuous variable), in three animals: zebras, lions & giraffes. My variables would be i) happiness, ii) food and iii) species. As I understand it, I could set up a regression model in three different ways:
Using dummy coding (i.e. 1 or 0 for zebra & lion), with giraffe as my reference category:
Happiness ~ food + food x zebra + food x lion
Including interaction terms for all species:
Happiness ~ food + food x zebra + food x lion + food x giraffe
By including interaction terms for all species without a main effect:
Happiness ~ food x zebra + food x lion + food x giraffe
The 2nd example makes the most sense to me, as it seems to isolate the trans-species effect of food eaten in the “food” variable, and then captures the interaction effect for each species. However, most guides I’ve read seem to recommend the former approach, but they don’t explain why. Please could someone explain whether one model is preferable?
NB: My concern with the first approach is that the “food” variable neither reflects a trans-species effect (because it is skewed towards the effect for giraffes, as they don’t have their interaction term) nor is it equivalent to the food*giraffe term, (as it includes some trans-species effect). Have I misunderstood something?