how to get value of a specified node in XML tree using R language

Question

i have a xml file that includes rootNode and child node with attributes that handle values.

i am using the R language to work on the xml file.

what i need is to display the result of the employees that are in the department IT

how to display the ID or the name of the employees that are in the IT department?

i used this code:

print(getNodeSet(rootnode,"//EMPLOYEE/DEPT[@DEPT='IT']"))

where rootnode is the variable that handle the value : RECORDS

IT DID NOT WORK

xml file :

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>Rick</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>2</ID>
      <NAME>Dan</NAME>
      <SALARY>515.2</SALARY>
      <STARTDATE>9/23/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>3</ID>
      <NAME>Michelle</NAME>
      <SALARY>611</SALARY>
      <STARTDATE>11/15/2014</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>4</ID>
      <NAME>Ryan</NAME>
      <SALARY>729</SALARY>
      <STARTDATE>5/11/2014</STARTDATE>
      <DEPT>HR</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>5</ID>
      <NAME>Gary</NAME>
      <SALARY>843.25</SALARY>
      <STARTDATE>3/27/2015</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>6</ID>
      <NAME>Nina</NAME>
      <SALARY>578</SALARY>
      <STARTDATE>5/21/2013</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>7</ID>
      <NAME>Simon</NAME>
      <SALARY>632.8</SALARY>
      <STARTDATE>7/30/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>8</ID>
      <NAME>Guru</NAME>
      <SALARY>722.5</SALARY>
      <STARTDATE>6/17/2014</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

</RECORDS>

Prem · Accepted Answer · 2018-02-12T11:44:06.177

1

Seems you need to modify getNodeSet as below.

getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME")

In case you want to have more than one column in the Output:

library(XML)
library(dplyr)

#sample data
xml_data <- xmlParse("<RECORDS>
  <EMPLOYEE><ID>1</ID><NAME>Rick</NAME><SALARY>623.3</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>2</ID><NAME>Dan</NAME><SALARY>515.2</SALARY><DEPT>Operations</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>3</ID><NAME>Michelle</NAME><SALARY>611</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  </RECORDS>")


df <- xmlToDataFrame(nodes=getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']")) %>%
  select(NAME, SALARY)
df

Output is:

      NAME SALARY
1     Rick  623.3
2 Michelle    611

(Edit - modified code to have more than one column in the output)

edited Feb 12 '18 at 11:44

answered Feb 12 '18 at 11:14

Prem

11,775
1
19
33

@perm if i want to display more than one attributes let say NAME and SALARY this will work ??? getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME/SALARY") – Ghgh Lhlh Feb 12 '18 at 11:23
what mean the **%>%** – Ghgh Lhlh Feb 12 '18 at 12:12
This is `dplyr` pipe operator. I used it to filter desired columns only. Otherwise you will have `ID` & `DEPT` columns as well (with the sample input which I used in my code) – Prem Feb 12 '18 at 12:15

how to get value of a specified node in XML tree using R language

xml file :

1 Answers1