Sunday, April 28, 2013

How to parse xml in python using etree

Like the previous example How to parse xml in python using minidom, let us use etree to parse xml this time. We will be doing the same 4 operations:

- Read the content of an xml tag
- Read attribute value
- Add a node
- Delete a node

And we will be using the same xml file named 'file.xml'

<?xml version="1.0" encoding="UTF-8" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
  </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
  </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
  </food>
</menu>

Read the content of xml tag

from lxml import etree

doc = etree.parse('file.xml')
nodes = doc.findall('food/name')
for node in nodes:
    print node.text

Output:
Pesto Chicken Sandwich
Chipotle Chicken Pizza
Burrito

Read attribute value

from lxml import etree

doc = etree.parse('file.xml')
nodes = doc.findall('food')
for node in nodes:
    print node.attrib['id']

Output:
1
2
3

Add a node

from lxml import etree
from lxml.etree import SubElement

doc = etree.parse('file.xml')
nodes = doc.findall('food')
for node in nodes:
    rating = SubElement(node, 'rating', value='5')
    rating.text = 'Average'

ofile = open('newfile.xml','w')
ofile.write(etree.tostring(doc))
ofile.close()

This produces the following output
<?xml version="1.0" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
    <rating value="5">Average</rating>
   </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
    <rating value="5">Average</rating>
   </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
    <rating value="5">Average</rating>
   </food>
</menu>

Delete a node
Let us delete all <rating> tags from the newfile.xml file.

from lxml import etree

doc = etree.parse('newfile.xml')
nodes = doc.findall('food/rating')
for node in nodes:
    parent = node.getparent()
    parent.remove(node)

doc.write('file.xml')

This produces the original file we started with!

No comments:

Post a Comment