Concatenate 3 columns as Linear Regression Model

Question

This is my data.frame:

df.index= dput(df.index)
    structure(list(Var1 = structure(c(43L, 42L, 46L, 33L, 29L), .Label = c("ABEV3", 
    "AEDU3", "ALLL3", "BBAS3", "BBDC3", "BBDC4", "BISA3", "BRAP4", 
    "BRFS3", "BRKM5", "BRML3", "BRPR3", "BVMF3", "CCRO3", "CESP6", 
    "CIEL3", "CMIG4", "CPFE3", "CPLE6", "CRUZ3", "CSAN3", "CSNA3", 
    "CTIP3", "CYRE3", "DASA3", "DTEX3", "ECOR3", "ELET3", "ELET6", 
    "ELPL4", "EMBR3", "ENBR3", "ESTC3", "EVEN3", "FIBR3", "GFSA3", 
    "GGBR4", "GOAU4", "GOLL4", "HGTX3", "HYPE3", "ITSA4", "ITUB4", 
    "JBSS3", "KLBN4", "KROT3", "LAME4", "LIGT3", "LREN3", "MRFG3", 
    "MRVE3", "NATU3", "OIBR4", "PCAR4", "PDGR3", "PETR3", "PETR4", 
    "QUAL3", "RENT3", "RSID3", "SANB11", "SBSP3", "SUZB5", "TBLE3", 
    "TIMP3", "UGPA3", "USIM5", "VALE3", "VALE5", "VIVT4"), class = "factor"), 
        Var2 = structure(c(42L, 43L, 33L, 46L, 28L), .Label = c("ABEV3", 
        "AEDU3", "ALLL3", "BBAS3", "BBDC3", "BBDC4", "BISA3", "BRAP4", 
        "BRFS3", "BRKM5", "BRML3", "BRPR3", "BVMF3", "CCRO3", "CESP6", 
        "CIEL3", "CMIG4", "CPFE3", "CPLE6", "CRUZ3", "CSAN3", "CSNA3", 
        "CTIP3", "CYRE3", "DASA3", "DTEX3", "ECOR3", "ELET3", "ELET6", 
        "ELPL4", "EMBR3", "ENBR3", "ESTC3", "EVEN3", "FIBR3", "GFSA3", 
        "GGBR4", "GOAU4", "GOLL4", "HGTX3", "HYPE3", "ITSA4", "ITUB4", 
        "JBSS3", "KLBN4", "KROT3", "LAME4", "LIGT3", "LREN3", "MRFG3", 
        "MRVE3", "NATU3", "OIBR4", "PCAR4", "PDGR3", "PETR3", "PETR4", 
        "QUAL3", "RENT3", "RSID3", "SANB11", "SBSP3", "SUZB5", "TBLE3", 
        "TIMP3", "UGPA3", "USIM5", "VALE3", "VALE5", "VIVT4"), class = "factor"), 
        time = structure(c(1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "t")), class = "data.frame", row.names = c(NA, 
    -5L))

It goes like this:

   Var1  Var2 time
1 ITUB4 ITSA4    t
2 ITSA4 ITUB4    t
3 KROT3 ESTC3    t
4 ESTC3 KROT3    t
5 ELET6 ELET3    t

I want to concatenate this 3 columns in a text like this:

"ITUB4~ITSA4+t" "ITSA4~ITUB4+t" "KROT3~ESTC3+t" "ESTC3~KROT3+t" "ELET6+ELET3+t"

I am using apply function:

df.index=apply(df.index,1,paste,collapse="~+")

But the result it´s wrong. The problem is that I am not able to separate the the second column from the third column using the "+" symbol. How can I separate the second variable from the "t" variable with "+" symbol?

The result That I want is:

"ITUB4~ITSA4+t" "ITSA4~ITUB4+t" "KROT3~ESTC3+t" "ESTC3~KROT3+t" "ELET6+ELET3+t"

As I mentioned above.

related: [How to match a data frame of variable names and another with data for a regression?](https://stackoverflow.com/q/51914163/4891738) — Zheyuan Li, Aug 20 '18 at 13:23

akrun · Accepted Answer · 2018-08-20T13:27:20.553

2

We can use paste

with(df.index, paste0(Var1, "~", Var2, "+", time))
#[1] "ITUB4~ITSA4+t" "ITSA4~ITUB4+t" "KROT3~ESTC3+t" "ESTC3~KROT3+t" "ELET6~ELET3+t"

As the OP mentioned about getting the results with apply, specify the MARGIN as 1 for rowwise, then apply paste in each row of the dataset. It would be less efficient as paste is vectorized

apply(df.index, 1, FUN = function(x) paste0(x[1], "~", x[2], "+", x[3]))

edited Aug 20 '18 at 13:27

answered Aug 20 '18 at 13:05

akrun

874,273
37
540
662

1

`"Var2"` should probably be `Var2`. – missuse Aug 20 '18 at 13:07
@akrun Is it possible to do this using the apply function? – Aug 20 '18 at 13:20
1

@DiogoBastos Yes, you can do it, but it will be less efficient i.e. `apply(df.index, 1, function(x) paste0(x[1], "~", "x[2], "+", x[3]))` – akrun Aug 20 '18 at 13:22

score 1 · Answer 2 · answered Aug 20 '18 at 13:21

If you want a formula (class formula) for each you could so the following. Note that I first change all of your factors to characters with mutate_if

library(tidyverse)

df <- df %>% mutate_if(is.factor, as.character) %>%
  mutate(forms = map2(Var1, Var2, ~reformulate(c(.y, "t"), .x, TRUE)))
df
#>    Var1  Var2 time             forms
#> 1 ITUB4 ITSA4    t ITUB4 ~ ITSA4 + t
#> 2 ITSA4 ITUB4    t ITSA4 ~ ITUB4 + t
#> 3 KROT3 ESTC3    t KROT3 ~ ESTC3 + t
#> 4 ESTC3 KROT3    t ESTC3 ~ KROT3 + t
#> 5 ELET6 ELET3    t ELET6 ~ ELET3 + t

df$forms
#> [[1]]
#> ITUB4 ~ ITSA4 + t
#> <environment: 0x7fe3b5854c88>
#> 
#> [[2]]
#> ITSA4 ~ ITUB4 + t
#> <environment: 0x7fe3b583d1a8>
#> 
#> [[3]]
#> KROT3 ~ ESTC3 + t
#> <environment: 0x7fe3b58352f8>
#> 
#> [[4]]
#> ESTC3 ~ KROT3 + t
#> <environment: 0x7fe3b58333a8>
#> 
#> [[5]]
#> ELET6 ~ ELET3 + t
#> <environment: 0x7fe3b581c8a8>

Created on 2018-08-20 by the reprex package (v0.2.0).

Concatenate 3 columns as Linear Regression Model

2 Answers2