0

First, I have four variables distribuited in columns, coded "0,1", each "1" is the TRUE value con the condition, I need to label each "1", get all those in a columns and make an histogram that show the data with the labels.

I've been working with "Ethnic identification" variable , there are four possible options: "MAYA", "LADINO", "GARIFUNA", "XINKA", "EXTRANJERO"; in my base each option is in a different column with "0,1", I've tried to change those "1s" for different values as follows: "MAYA=1", "LADINO=2", "GARIFUNA=3" etc., to differenciate each value but i got lost in what to do next.

#ID_ETNICO<- CAPITAL  
   class(ID_ETNICO):


    $ IEE_MAYA                               : int  1 0 0 0 1 1 0 1 
    $ IEE_LADINO                             : int  0 1 0 0 0 0 1 0 
    $ IEE_GARIFUNA                           : int  0 0 1 0 0 0 0 0 
    $ IEE_XINKA                              : int  0 0 0 1 0 0 0 0 
    $ IEE_EXTRANJERO                         : int  0 0 0 0 0 0 0 0 



        ID_ETNICO$IEE_LADINO[ID_ETNICO$IEE_LADINO=="1"] <- 2  
        ID_ETNICO$IEE_GARIFUNA[ID_ETNICO$IEE_GARIFUNA=="1"] <- 3  
        ID_ETNICO$IEE_XINKA[ID_ETNICO$IEE_XINKA=="1"] <- 4  
        ID_ETNICO$IEE_EXTRANJERO[ID_ETNICO$IEE_EXTRANJERO=="1"] <- 5  


          $IEE_MAYA                                           : int  1 0 0 0 1 1 0 
          $ IEE_LADINO                                         : num  0 2 0 0 0 0 2 
          $ IEE_GARIFUNA                                       : num  0 0 3 0 0 0 0 
          $ IEE_XINKA                                          : num  0 0 0 4 0 0 0 
          $ IEE_EXTRANJERO                                     : num  0 0 0 0 0 0 0 


           table(ID_ETNICO$IEE_MAYA)
           table(ID_ETNICO$IEE_LADINO)
           table(ID_ETNICO$IEE_GARIFUNA)
           table(ID_ETNICO$IEE_XINKA)
           table(ID_ETNICO$IEE_EXTRANJERO)


               table(ID_ETNICO$IEE_MAYA)

0     1 

27533 5263

table(ID_ETNICO$IEE_LADINO)

0     2 

6354 26442

table(ID_ETNICO$IEE_GARIFUNA)

0     3 

32593 203

table(ID_ETNICO$IEE_XINKA)

0     4 

32649 147

table(ID_ETNICO$IEE_EXTRANJERO)

0     5 

32576 220

Now, I need to label "1=MAYA", "2=LADINO", "3=GARIFUNA", "4=XINKA", "5=EXTRANJERO", merge in a single column and obtain the frequencies of each label and make a histogram.

Javier
  • 1
  • 2
  • hi, javier - i'd like to help. sounds like you are trying to convert multiple dummy-coded variables into a single factor variable (assuming someone cannot have more than one identity, correct?). there are other SO questions like this: https://stackoverflow.com/questions/29870994/dummy-variables-to-single-categorical-variable-factor-in-r and https://stackoverflow.com/questions/14161202/recoding-dummy-variable-to-ordered-factor - please let me know if this is helpful – Ben Jun 10 '19 at 21:50

1 Answers1

0

Assuming data is coded for one ethnic identification at a time, you can convert multiple dummy-coded variables into a single factor. Let me know if this is what you had in mind.

ID_ETNICO <- data.frame(
  IEE_MAYA = c(1,0,0,0,1,1,0,1),
  IEE_LADINO = c(0,1,0,0,0,0,1,0),
  IEE_GARIFUNA = c(0,0,1,0,0,0,0,0),
  IEE_XINKA = c(0,0,0,1,0,0,0,0),
  IEE_EXTRANJERO = c(0,0,0,0,0,0,0,0)
)

# Remove IEE_ from column names
names(ID_ETNICO) <- substring(names(ID_ETNICO), 5)

# Change dummy variables to factor
TIPO_ETNICO <- factor(names(ID_ETNICO)[max.col(ID_ETNICO)])

# Show frequency table and bar plot
table(TIPO_ETNICO)
barplot(table(TIPO_ETNICO))
Ben
  • 28,684
  • 5
  • 23
  • 45
  • It is great!! , but the problems is the lenght of the data, because each column has 32,000 entries, I need that R recognize the values by itself instead of I introduce the data, How I do that? – Javier Jun 10 '19 at 23:36
  • i'm not sure i understand, can you clarify what is your source of data? use dput() or give example of what you are starting from. see: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Ben Jun 10 '19 at 23:42