2

i have a xml file that includes rootNode and child node with attributes that handle values.

i am using the R language to work on the xml file.

what i need is to display the result of the employees that are in the department IT

how to display the ID or the name of the employees that are in the IT department?

i used this code:

print(getNodeSet(rootnode,"//EMPLOYEE/DEPT[@DEPT='IT']"))

where rootnode is the variable that handle the value : RECORDS

IT DID NOT WORK

xml file :

<RECORDS>
   <EMPLOYEE>
      <ID>1</ID>
      <NAME>Rick</NAME>
      <SALARY>623.3</SALARY>
      <STARTDATE>1/1/2012</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>2</ID>
      <NAME>Dan</NAME>
      <SALARY>515.2</SALARY>
      <STARTDATE>9/23/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>3</ID>
      <NAME>Michelle</NAME>
      <SALARY>611</SALARY>
      <STARTDATE>11/15/2014</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>4</ID>
      <NAME>Ryan</NAME>
      <SALARY>729</SALARY>
      <STARTDATE>5/11/2014</STARTDATE>
      <DEPT>HR</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>5</ID>
      <NAME>Gary</NAME>
      <SALARY>843.25</SALARY>
      <STARTDATE>3/27/2015</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>6</ID>
      <NAME>Nina</NAME>
      <SALARY>578</SALARY>
      <STARTDATE>5/21/2013</STARTDATE>
      <DEPT>IT</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>7</ID>
      <NAME>Simon</NAME>
      <SALARY>632.8</SALARY>
      <STARTDATE>7/30/2013</STARTDATE>
      <DEPT>Operations</DEPT>
   </EMPLOYEE>

   <EMPLOYEE>
      <ID>8</ID>
      <NAME>Guru</NAME>
      <SALARY>722.5</SALARY>
      <STARTDATE>6/17/2014</STARTDATE>
      <DEPT>Finance</DEPT>
   </EMPLOYEE>

</RECORDS>
Ghgh Lhlh
  • 155
  • 1
  • 3
  • 14

1 Answers1

1

Seems you need to modify getNodeSet as below.

getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME")


In case you want to have more than one column in the Output:

library(XML)
library(dplyr)

#sample data
xml_data <- xmlParse("<RECORDS>
  <EMPLOYEE><ID>1</ID><NAME>Rick</NAME><SALARY>623.3</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>2</ID><NAME>Dan</NAME><SALARY>515.2</SALARY><DEPT>Operations</DEPT></EMPLOYEE>
  <EMPLOYEE><ID>3</ID><NAME>Michelle</NAME><SALARY>611</SALARY><DEPT>IT</DEPT></EMPLOYEE>
  </RECORDS>")


df <- xmlToDataFrame(nodes=getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']")) %>%
  select(NAME, SALARY)
df

Output is:

      NAME SALARY
1     Rick  623.3
2 Michelle    611

(Edit - modified code to have more than one column in the output)

Prem
  • 11,775
  • 1
  • 19
  • 33
  • @perm if i want to display more than one attributes let say NAME and SALARY this will work ??? getNodeSet(xml_data, "//EMPLOYEE[DEPT='IT']/NAME/SALARY") – Ghgh Lhlh Feb 12 '18 at 11:23
  • what mean the **%>%** – Ghgh Lhlh Feb 12 '18 at 12:12
  • This is `dplyr` pipe operator. I used it to filter desired columns only. Otherwise you will have `ID` & `DEPT` columns as well (with the sample input which I used in my code) – Prem Feb 12 '18 at 12:15