1

I am trying to process municipal information in R and it seems that factors (to be exact factor()). are the best way to achieve my goal. I am only starting to get the hang of R, so I imagine my problem is possibly very simple.

I have the following example dataframe to share (a tiny portion of Finnish municipalities):

municipality<-c("Espoo", "Oulu", "Tampere", "Joensuu", "Seinäjoki", 
"Kerava")
region<-c("Uusimaa","Pohjois-Pohjanmaa","Pirkanmaa","Pohjois-Karjala","Etelä-Pohjanmaa","Uusimaa")

myData<-cbind(municipality,region)
myData<-as.data.frame(myData)

By default R converts my character columns into factors, which can be tested with str(myData). Now to the part where my beginner to novice level R skills end: I can't seem to find a way to apply factors from column region to column municipality.

Let me demonstrate. Instead of having the original result

as.numeric(factor(myData$municipality))

[1] 1 4 6 2 5 3

I would like to get this, the factors from myData$region applied to myData$municipality.

as.numeric(factor(myData$municipality))

[1] 5 4 2 3 1 5

I welcome any help with open arms. Thank you.

Vesanen
  • 387
  • 1
  • 5
  • 13
  • https://stackoverflow.com/a/5800785/5414452 – jogo Mar 16 '18 at 08:29
  • Do you want to reorder `myData$municipality` according to `as.numeric(myData$region)`? If so, you could do `myData$municipality[myData$region]`. As @jogo pointed out, you could / should use `with()` instead of `attach()`, e.g. `with(myData, municipality[region])`. – markus Mar 16 '18 at 08:44
  • Thank you for the comments jogo and markus. First I'd like to address my use of attach(). Consider it gone. I don't really ever use it, but now did for some reason. In this problem I would like to get the `region` factor level in the `municipality` factor level, replacing the original. For example, `levels(myData$municipality)<-c(levels(myData$region))` does not work. Pardon me, part of the problem here seems to be that I am not that familiar with R vernacular or any programming language vernacular for that matter. – Vesanen Mar 16 '18 at 08:59

1 Answers1

0

To better understand the use of factor in R have a look here.

If you want to add factor levels, you have to do something like this in your dataframe:

levels(myData$region)
[1] "Etelä-Pohjanmaa"   "Pirkanmaa"         "Pohjois-Karjala"   "Pohjois-Pohjanmaa" "Uusimaa"          
> levels(myData$municipality)
[1] "Espoo"     "Joensuu"   "Kerava"    "Oulu"      "Seinäjoki" "Tampere"  
> levels(myData$municipality)<-c(levels(myData$municipality),levels(myData$region))
> levels(myData$municipality)
 [1] "Espoo"             "Joensuu"           "Kerava"            "Oulu"              "Seinäjoki"        
 [6] "Tampere"           "Etelä-Pohjanmaa"   "Pirkanmaa"         "Pohjois-Karjala"   "Pohjois-Pohjanmaa"
[11] "Uusimaa"
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Thank you for your answer, but this is not what I am trying to achieve. I am trying to have my dataframe to store the information in numeric form that for example "Espoo" and "Kerava" belong in "Uusimaa" and Joensuu belongs in "Etelä-Pohjanmaa" using factors. – Vesanen Mar 16 '18 at 09:02