0

I'm fairly new to programming in R, and I have a pretty basic question.

I have loaded the following XML document into a dataframe http://www.xmldatasets.net/temp/179681356453762.xml using the XML library. My question is how I would go about creating a function that returned the senators name given his/her state?

i.e. something like senatorName(state), where the return value would be a vector of the senator(s) of that state.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Kyle Weise
  • 869
  • 1
  • 8
  • 29
  • 1
    XML is kind of a pain, but [the Sunlight Foundation publishes that same information as a CSV](https://sunlightlabs.github.io/congress/#legislator-spreadsheet) (or API, if you like). You can grab it directly with just `congress <- read.csv('http://unitedstates.sunlightfoundation.com/legislators/legislators.csv')` and then subset normally, e.g. `congress[congress$title == 'Sen' & congress$state == 'CA', 1:5]` – alistaire Oct 06 '16 at 22:06

3 Answers3

2

XML is kind of a pain, but the Sunlight Foundation publishes that same information as a CSV (or API, if you like). You can grab it directly with just

congress <- read.csv('http://unitedstates.sunlightfoundation.com/legislators/legislators.csv')

and then subset it normally, or make a function to do so:

find_senators <- function(state){
    sens <- congress[congress$title == 'Sen' & 
                     congress$state == state & 
                     congress$in_office == 1, 1:5]
    unname(apply(sens, 1, function(x){paste(x[x != ''], collapse = ' ')}))
}

find_senators("CA")
## [1] "Sen Barbara Boxer"    "Sen Dianne Feinstein"

find_senators("IL")
## [1] "Sen Richard J. Durbin" "Sen Mark Steven Kirk" 
alistaire
  • 42,459
  • 4
  • 77
  • 117
1

Maybe something like this?

library(XML)
tg<-xmlToDataFrame("http://www.xmldatasets.net/temp/179681356453762.xml")

mt<-data.frame(fname=cbind(apply(tg[,2:3],1,function(x) paste0(x,collapse=", "))),state=tg$state)

mt[mt$state=="TX",]
                   fname state
28          Cornyn, John    TX
43 Hutchison, Kay Bailey    TX

As commented below, if you do not want to collapse the family name and first name to one column you could just take the converted XML table, tg, and type:

tg[tg$state=="TX",]

To get all the info about the senators in Texas. And if you need only the name and the state you could subset it as:

tg[tg$state=="TX",c(2:3,5)]

If you want to be able to type the state name in the console and get the name and states:

for (j in unique(tg$state)){
  assign(j,tg[tg$state==j,c(2,3,5)])
}

And then type the state name, e.g. MT, and get the output:

> MT
   last_name first_name state
5     Baucus        Max    MT
89    Tester        Jon    MT

And you can also make a function of this:

senatorName<-function (x) tg[which(tg$state==paste0(x)),c(2:3,5)]; 
> senatorName("TX")
       last_name first_name state
    28    Cornyn       John    TX
    43 Hutchison Kay Bailey    TX
nadizan
  • 1,323
  • 10
  • 23
  • This is close to what I need. Although I do not understand what that long middle line of code is doing, I need this as a function, where I can input the abbrev. of a state. As opposed to hard-coded as you've done. – Kyle Weise Oct 06 '16 at 22:25
  • Please clarify. What you are saying is rather ambiguous. The long line of code is just to collapse the family name and first name to one column. However, you could just do this and be fine: tg[tg$state=="TX",] @KyleWeise – nadizan Oct 06 '16 at 22:40
  • In your example, you've hard-coded "TX" as the state. I need this in terms of a function, where I can enter the state abbrev. (.ie. TX or HI or CT) and it return the corresponding senators – Kyle Weise Oct 06 '16 at 22:45
  • In the example about you can write whatever state you want. Change "TX" into "HI" or "CT" and it will work. If you want to be able to type the state name in the console and get the name and states, see updated answer above. @KyleWeise – nadizan Oct 06 '16 at 23:02
  • I think you're misunderstanding my question. I need to write a **function**. I need to be able to write: senatorName(TX) and have it return the names of the senators. – Kyle Weise Oct 06 '16 at 23:18
  • Alright, this is just getting ridiculous. I suggest you reading up on the basics on R, maybe have a look here (http://stackoverflow.com/questions/1744861/how-to-learn-r-as-a-programming-language) I think that you should be able to figure it out by now. Other than that: here is your function: senatorName<-function (x) tg[which(tg$state==paste0(x)),c(2:3,5)]; senatorName("TX") – nadizan Oct 06 '16 at 23:55
0

It's a good time to start developing good habits:

library(xml2)
library(purrr)
library(dplyr)

doc <- read_xml("http://www.senate.gov/general/contact_information/senators_cfm.xml")

xml_find_all(doc, ".//member") %>% 
  map_df(function(x) {
    set_names(xml_text(xml_children(x)), xml_name(xml_children(x))) %>% 
      as.list()
  }) -> senators

senator_name <- function(df, x) {
  filter(df, state==x) %>% 
    mutate(senator=sprintf("%s %s", first_name, last_name)) %>% 
    select(senator) %>% 
    flatten_chr()
}

senator_name(senators, "TX")

XML really isn't that painful and you'll unfortunately come across it quite a bit, so getting some practice in is worth the effort, IMO.

Using dplyr idioms will help you think in terms of the operations you want to perform and rely less on subsetting syntax. Such syntax is fine, but when you start doing more analysis work in R, you'll be glad you learned dplyr.

You should also get used to passing in the data and parameters to operate on/with. No copy is made until there's a modification, and this way you're not relying on global objects.

For larger data sets or repeated operations, I'd assign xml_children(x) to a temporary variable in the lambda function vs generate the data twice.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205