Sunday, April 28, 2013

How to parse xml in python using etree

Like the previous example How to parse xml in python using minidom, let us use etree to parse xml this time. We will be doing the same 4 operations:

- Read the content of an xml tag
- Read attribute value
- Add a node
- Delete a node

And we will be using the same xml file named 'file.xml'

<?xml version="1.0" encoding="UTF-8" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
  </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
  </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
  </food>
</menu>

Read the content of xml tag

from lxml import etree

doc = etree.parse('file.xml')
nodes = doc.findall('food/name')
for node in nodes:
    print node.text

Output:
Pesto Chicken Sandwich
Chipotle Chicken Pizza
Burrito

Read attribute value

from lxml import etree

doc = etree.parse('file.xml')
nodes = doc.findall('food')
for node in nodes:
    print node.attrib['id']

Output:
1
2
3

Add a node

from lxml import etree
from lxml.etree import SubElement

doc = etree.parse('file.xml')
nodes = doc.findall('food')
for node in nodes:
    rating = SubElement(node, 'rating', value='5')
    rating.text = 'Average'

ofile = open('newfile.xml','w')
ofile.write(etree.tostring(doc))
ofile.close()

This produces the following output
<?xml version="1.0" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
    <rating value="5">Average</rating>
   </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
    <rating value="5">Average</rating>
   </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
    <rating value="5">Average</rating>
   </food>
</menu>

Delete a node
Let us delete all <rating> tags from the newfile.xml file.

from lxml import etree

doc = etree.parse('newfile.xml')
nodes = doc.findall('food/rating')
for node in nodes:
    parent = node.getparent()
    parent.remove(node)

doc.write('file.xml')

This produces the original file we started with!

Change byte ordering in bash shell - Convert from BGR to RGB format

Say for example BGR color code is BBGGRR; to convert it into RGB format in bash shell:

$ color="BBGGRR"
$ echo ${color:4:2}${color:2:2}${color:0:2}

The format is
${variable:index:length}

This thing could also be used to convert big-endian to little-endian formats.

Friday, April 26, 2013

How to parse xml in python using minidom

The most common things i find myself doing when working with xml files are the following.

- Read the content of an xml tag
- Read attribute value
- Add a node
- Delete a node

Lets see how to do this in python using minidom. For the purpose of this post, lets assume that the name of the file is "file.xml" with following content.

<?xml version="1.0" encoding="UTF-8" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
  </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
  </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
  </food>
</menu>

Read the content of xml tag

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('name')
for node in nodes:
    print node.firstChild.nodeValue

Output:
Pesto Chicken Sandwich
Chipotle Chicken Pizza
Burrito

Read attribute value

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('food')
for node in nodes:
    if node.attributes.has_key('id'):
        print node.attributes['id'].value

Output:
1
2
3

Add a node
Lets add a <rating> node with default value 5 to each of the food item to know its popularity.

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('food')
for node in nodes:
    rating = doc.createElement('rating')
    rating.setAttribute('value','5')
    text = doc.createTextNode('Average')
    rating.appendChild(text)
    node.appendChild(rating)

ofile = open('newfile.xml','w')
doc.writexml(ofile)
ofile.close()

Output: The resulting xml looks like:
<?xml version="1.0" ?>
<menu>
  <food id="1">
    <name>Pesto Chicken Sandwich</name>
    <price>$7.50</price>
    <rating value="5">Average</rating>
   </food>
  <food id="2">
    <name>Chipotle Chicken Pizza</name>
    <price>$12.00</price>
    <rating value="5">Average</rating>
   </food>
  <food id="3">
    <name>Burrito</name>
    <price>$6.20</price>
    <rating value="5">Average</rating>
   </food>
</menu>

Delete a node
Lets now delete the <rating> tag from the food item.

from xml.dom import minidom

doc = minidom.parse('file.xml')
nodes = doc.getElementsByTagName('rating')
for node in nodes:
    parent = node.parentNode
    parent.removeChild(node)

ofile = open('newfile.xml','w')
doc.writexml(ofile)
ofile.close()
             
This result is xml file similar to the one we started with.