Friday, April 26, 2013

How to parse xml in python using minidom

The most common things i find myself doing when working with xml files are the following.

- Read the content of an xml tag
- Read attribute value
- Add a node
- Delete a node

Lets see how to do this in python using minidom. For the purpose of this post, lets assume that the name of the file is "file.xml" with following content.

<?xml version="1.0" encoding="UTF-8" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
  </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
  </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
  </food>
</menu>

Read the content of xml tag

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('name')
for node in nodes:
    print node.firstChild.nodeValue

Output:
Pesto Chicken Sandwich
Chipotle Chicken Pizza
Burrito

Read attribute value

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('food')
for node in nodes:
    if node.attributes.has_key('id'):
        print node.attributes['id'].value

Output:
1
2
3

Add a node
Lets add a <rating> node with default value 5 to each of the food item to know its popularity.

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('food')
for node in nodes:
    rating = doc.createElement('rating')
    rating.setAttribute('value','5')
    text = doc.createTextNode('Average')
    rating.appendChild(text)
    node.appendChild(rating)

ofile = open('newfile.xml','w')
doc.writexml(ofile)
ofile.close()

Output: The resulting xml looks like:
<?xml version="1.0" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
    <rating value="5">Average</rating>
   </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
    <rating value="5">Average</rating>
   </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
    <rating value="5">Average</rating>
   </food>
</menu>

Delete a node
Lets now delete the <rating> tag from the food item.

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('rating')
for node in nodes:
    parent = node.parentNode
    parent.removeChild(node)

ofile = open('newfile.xml','w')
doc.writexml(ofile)
ofile.close()
             
This result is xml file similar to the one we started with.

15 comments:

  1. Just a straightforward xml parsing with minidom.
    Great work!

    ReplyDelete
  2. how to get the following output:
    Pesto Chicken Sandwich = $7.50
    Chipotle Chicken Pizza = $12.00
    Burrito = $6.20

    ReplyDelete
    Replies
    1. from xml.dom import minidom

      doc = minidom.parse('file.xml')
      nodes = doc.getElementsByTagName('food')
      for node in nodes:
      prices=node.getElementsByTagName('price')[0]
      names=node.getElementsByTagName('name')[0]
      print names.firstChild.nodeValue,"=",prices.firstChild.nodeValue

      Delete
    2. Another implementation:
      but not a very good idea :D

      from xml.dom import minidom
      doc = minidom.parse('file.xml')
      nodes = doc.getElementsByTagName('name')
      prices=doc.getElementsByTagName('price')
      for node in nodes:
      for price in prices:
      print node.firstChild.nodeValue,"=",price.firstChild.nodeValue
      prices.remove(price)
      break

      Delete
  3. Simple and well structred presntation

    ReplyDelete
  4. how can i delete?



    ...

    ReplyDelete
  5. How can i delete following node

    food id="3"

    ReplyDelete
  6. Very simple, elegant and nice flow.

    ReplyDelete
  7. Hi .. I am very new to python .. would like to know where do you place file.xml in wing IDE . You dont need to give path ? I am getting error here doc = minidom.parse('file.xml')
    it does not find file.xml . Thanks for you help in advance !!

    ReplyDelete
    Replies
    1. it worked now !!

      Delete
    2. Hi
      I am also new to python.....i am also getting the same error as above minidom is not reading the file
      doc = minidom.parse('lens.xml')
      Please help me!!

      Delete
  8. Thanks! This was a very helpful post!

    ReplyDelete
  9. How can i update the price of "Pesto Chicken Sandwich" from 7.50 to 10

    ReplyDelete
  10. How to remove the whitespace characters which read from an xml document ?

    ReplyDelete
  11. Thanks, delete node was really difficult to find on google

    ReplyDelete