0

I am trying to parse a pom file and I'm facing an issue for which I can't figure out a solution. My current code successfully reads, parses, and output from a pom file. The problem comes where the dependencies do not specify artifactId, groupId, version in the same order.

What condition should I place in my for loop so it ignores tags(such as type) and only retrieves artifactId, groupId, and version?

Code:

for dep in depend:
    infoList = []
    counter += 1
    for child in dep.getchildren():
        infoList.append(child.tag.split('}')[1])
        infoList.append(child.text)

    #list where data is being stored
    dependencyInfo[infoList[1]].update({infoList[2] : infoList[3],infoList[4] : infoList[5]})

Pom file example

<dependency>
    <artifactId>slf4j-api</artifactId>
    <groupId>org.slf4j</groupId>
    <type>jar</type>
    <version>1.6.1</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>log4j-over-slf4j</artifactId>
    <version>1.6.1</version>
</dependency>
<dependency>
    <groupId>sample.ProjectA</groupId>
    <artifactId>Project-A</artifactId>
    <scope>compile</scope>
    <version>1.0</version>
    <optional>true</optional>
</dependency>

Actual Output:

defaultdict(<class 'dict'>,{'slf4j-api': {'groupId': 'org.slf4j', 'type': 'jar'}, 'org.slf4j': {'artifactId': 'log4j-over-slf4j', 'version': '1.6.1'}, 'sample.ProjectA': {'artifactId': 'Project-A', 'scope': 'compile'}})

Expected Output:

defaultdict(<class 'dict'>,{'slf4j-api': {'groupId': 'org.slf4j', 'version': '1.6.1'}, 'org.slf4j': {'artifactId': 'log4j-over-slf4j', 'version': '1.6.1'}, 'sample.ProjectA': {'artifactId': 'Project-A', 'version': '1.0'}})

Any help would be appreciated

Kavitha Karunakaran
  • 1,340
  • 1
  • 17
  • 32
BigO
  • 334
  • 1
  • 3
  • 16

1 Answers1

2

Since your file example looks like XML, I recommend using an XML parser instead of crafting one yourself.

There is a bit of a learning curve getting the exact data that you want but it's worth learning because it can scale up to parse more advance and complex types and it won't have logical errors.

WyattBlue
  • 591
  • 1
  • 5
  • 21