7

In R there is nice functionality for running a regression with dummy variables for each level of a categorical variable. e.g. Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

Is there an equivalent way to do this in Julia.

x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]

data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
    data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
    I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
    I21+I22+I23+I24, data)
Community
  • 1
  • 1
Rob Donnelly
  • 2,256
  • 2
  • 20
  • 29

1 Answers1

4

If you are using the DataFrames package, after you pool the data, the package will take care of the rest:

Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.

You can see the rest of documentation on pooled data here

ntdef
  • 484
  • 5
  • 13
  • btw, what needs to happen with pooled data frame: `pool!(data, [:g]); lm(y~x+g, data)` – LmW. May 30 '17 at 02:56
  • I think the above answers may be out of date. The way to do it is shown [here](https://juliastats.org/GLM.jl/stable/manual/#Categorical-variables-1) using "contrasts". `lm(@formula(y ~ x), data, contrasts = Dict(:x => DummyCoding()))` – Smithey Mar 08 '21 at 18:52