0

From my dataset original variable size is numeric (can be converted to a numeric where small = 1, medium = 2, large = 3).

id <- c('1','2','3','4', '5')
size <- c('small', 'large', 'small', 'small', 'medium')
dest1 <- c('1', '0', '1', '0', '1')
dest2 <- c('0', '1', '1', '0', '1')
via1 <- c('1', '1', '0', '0', '0')
via2 <- c('1', '0', '1', '0', '1')
value <- c('4', '561', '310', '106', '8')

original <- data.frame(id, size, dest1, dest2, via1, via2, value)

I want to interact in a systematic way, the variable size with variables starting with dest and via separately, (in my original dataset I have hundreds of variables starting with these words).

I have tried it manually (SIZExDUMMY) but it takes a lot of time to go like this for all possible interactions.

So finally the new database have to look like interacted. What is your proposal to get this outcome?

size_dest1 <- c('1', '0', '1', '0', '2')
size_dest2 <- c('0', '3', '1', '0', '2')
size_via1 <- c('1', '3', '0', '0', '0')
size_via2 <- c('1', '0', '1', '0', '2')
interacted <- data.frame(id, size, dest1, dest2, via1, via2, value, size_dest1, size_dest2, size_via1, size_via2)

In this way the first interaction is size x dest1 = c(1,3,1,1,2) x c(1,0,1,0,1) = c(1,0,1,0,2) = size_dest1. Same idea applies for size_dest2, ...., size_dest1, size_dest2, ....

Any clue?

Thanks

vog
  • 770
  • 5
  • 11

1 Answers1

1
  • Convert size column to factor with levels specified.
  • Create a vector of column names that you want to multiply with Size.
  • Convert factor to integer and multiply it with all the columns to create new columns.
original <- type.convert(original)
original$size <- factor(original$size, c('small', 'medium', 'large'))

cols <- grep('dest|via', names(original), value = TRUE)
original[paste0('size_', cols)] <- as.integer(original$size) * original[cols]

original
#  id   size dest1 dest2 via1 via2 value size_dest1 size_dest2 size_via1 size_via2
#1  1  small     1     0    1    1     4          1          0         1         1
#2  2  large     0     1    1    0   561          0          3         3         0
#3  3  small     1     1    0    1   310          1          1         0         1
#4  4  small     0     0    0    0   106          0          0         0         0
#5  5 medium     1     1    0    1     8          2          2         0         2

To copy original classes back we can create a copy of the original data, apply the above transformation and change the classes back.

copy <- original
#Trnaformation code from above
#...
#...

#Change the classes
original[names(copy)] <- Map(function(x, y) {class(x) <- class(y);x}, 
                             original[names(copy)], copy)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Is there a way in which I can recover the same type of variables that had at the beginning? I mean, "undo" the "type.convert()"? – vog Dec 29 '20 at 17:37
  • 1
    Why do you want to keep numbers as characters? You cannot undo `type.convert`. You can turn `cols` back to character. `original[cols] <- lapply(original[cols], as.character)` – Ronak Shah Dec 29 '20 at 23:20
  • In the original database I have numbers that are factors (even as character is a good Interpretation given that numbers in these cases don’t represent neither cardinality nor order), however they are read like integers with type.convert. The question would be something like “it is possible to save the type of each old column and fit it in the transformed database (don’t want them all to be characters as I also have numeric vector that may be numeric )?” – vog Dec 30 '20 at 10:07
  • 1
    You can keep a copy of original object, apply the transformations and change the classes back from the copy. See updated answer. – Ronak Shah Dec 30 '20 at 10:13
  • "Error in as.character.factor(x) : malformed factor In addition: Warning messages: 1: In str.default(obj, ...) : 'object' does not have valid levels() 2: In str.default(obj, ...) : 'object' does not have valid levels() 3: In str.default(obj, ...) : 'object' does not have valid levels() 4: In str.default(obj, ...) : 'object' does not have valid levels() 5: In str.default(obj, ...) : 'object' does not have valid levels() 6: In str.default(obj, ...) : 'object' does not have valid levels() " Any idea? – vog Dec 30 '20 at 13:57
  • I cannot reproduce this with the data you have shared in the post. Does it work for you? – Ronak Shah Dec 30 '20 at 14:01
  • No, it does not work for me. I am just following the `original` data and your code. Then, the last error arose. Do I need to install a particular package? – vog Dec 30 '20 at 14:15
  • 1
    I think you are on R < 4.0.0 which is converting characters to factors. Try using the data with `original <- data.frame(id, size, dest1, dest2, via1, via2, value, stringsAsFactors = FALSE)` – Ronak Shah Dec 30 '20 at 14:16
  • Indeed it works! Is it possible to apply **stringsAsFactors = FALSE** to an existing database? – vog Dec 30 '20 at 14:21
  • You can convert factor columns to character. Check this answer https://stackoverflow.com/a/2853231 – Ronak Shah Dec 30 '20 at 14:31
  • I have updated the version to 4.0.3 and "**#Change the classes original[names(copy)] <- Map(function(x, y) {class(x) <- class(y);x}, original[names(copy)], copy)**" works for the sample data I shared. However, it does not work (it **converts the data into an empty matrix**) when applying the code to my original data. Any clue? – vog Jan 04 '21 at 16:53
  • Better said... When I look for the dimension of the transformed original data **dim(original) = [1] 21714 1045**. However when trying to write the command **original** the output is an error **Error in as.character.factor(x) : malformed factor **. Moreover, when **View(original)**, the outcome is en empty matrix with only one empty column **id**... – vog Jan 04 '21 at 17:42