|
Application Programming: Using the XML Parser Object Class |
|
This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_struct. The xml_to_struct object class is designed to read data from an XML file with the following structure:
<Solar_System>
<Planet NAME='Mercury'>
<Orbit UNITS='kilometers' TYPE='ulong64'>579100000</Orbit>
<Period UNITS='days' TYPE='float'>87.97</Period>
<Satellites TYPE='int'>0</Satellites>
</Planet>
...
</Solar_System>
and place those values into an IDL array containing one structure variable for each <Planet> element. We use a structure variable for each <Planet> element so we can capture data of several data types in a single place.
| Note While this example is more complicated than the previous example, it is still rather simple. It is designed to illustrate a method whereby more complex XML data structures can be represented in IDL. |
To read the XML file and return a structure variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the structure data from the object instance data.
Notice that the elements of the XML data file include attributes. While we will retrieve and use some of the attribute data from the file, we will ignore some of it.
| Note When parsing an XML data file, you can pick and choose the data you wish to pull into IDL. This ability to selectively retrieve data from the XML file is one of the great advantages of an event-based parser over a tree-based parser. |
| Example Code This example is included in the file xml_to_struct__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. |
The following routine is the definition of the xml_to_struct object class:
PRO xml_to_struct__define
void = {PLANET, NAME: "", Orbit: 0ull, period:0.0, Moons:0}
void = {xml_to_struct, $
INHERITS IDLffXMLSAX, $
CharBuffer:"", $
planetNum:0, $
currentPlanet:{PLANET}, $
Planets : MAKE_ARRAY(9, VALUE = {PLANET})}
END
The following items should be considered when defining this class structure:
<Planet> elements of the XML file.
INHERITS keyword to inherit the object class structure and methods of the IDLffXMLSAX object.
charBuffer structure field is set equal to a string value. We will use this field to accumulate character data stored in XML elements.
planetNum structure field is set equal to an integer value. We will use this field to keep track of which array element we are currently populating.
currentPlanet structure field is set equal to a PLANET structure.
Planets structure field is set equal to a nine-element array of PLANET structures.
__define" (note the two underscore characters) to the class name.
We have explicitly defined our Planets structure field as a nine-element array of PLANET structures, which we can do because we know exactly how many <Planet> elements will be read from our XML file. Specifying the exact size of the data array in the class structure definition is very efficient (since we create the array only once) and eliminates the need to free the pointer in the Cleanup method. However, it has the following consequences:
EndElement method below).
| Note Although we describe this routine here first, the xml_to_struct__define routine must be the last routine in the xml_to_struct__define.pro file. |
The Init method is called when the an xml_to_struct parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:
FUNCTION xml_to_struct::Init self.planetNum = 0 RETURN, self->IDLffXMLSAX::Init() END
We do two things in this method:
planetNum field with the value of zero. We will increment this value as we populate the Planets array.
| Note Within a method, we can refer to the class structure variable with the implicit parameter self. Remember self is actually a reference to the xml_to_struct object instance. |
Init method, called on the self object reference.
| Note We perform our own initialization task (setting the value of the planetNum field) before calling the superclass's Init method. |
See IDLffXMLSAX::Init for details on the method we are overriding.
The Characters method is called when the xml_to_struct parser encounters character data inside an element. The following routine is the definition of the Characters method:
PRO xml_to_struct::characters, data self.charBuffer = self.charBuffer + data END
As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.
See IDLffXMLSAX::Characters for details on the method we are overriding.
The StartElement method is called when the xml_to_struct parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:
PRO xml_to_struct::startElement, URI, local, strName, attrName,
attrValue
CASE strName OF
"Solar_System": ; Do nothing
"Planet" : BEGIN
self.currentPlanet = {PLANET, "", 0ull, 0.0, 0}
self.currentPlanet.Name = attrValue[0]
END
"Orbit" : self.charBuffer = ''
"Period" : self.charBuffer = ''
"Moons" : self.charBuffer = ''
ENDCASE
END
Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
<Solar_System> element, we do nothing.
<Planet> element, we do the following things:currentPlanet field of the self instance data structure equal to a PLANET structure, setting the values of the structure fields to zero values.
Name field of the PLANET structure held in the currentPlanet field equal to the value of the Name attribute of the element. This field contains the name of the planet whose data we are reading.
<Orbit>, <Period>, or <Moons> element, we reinitialize the value of the charBuffer field of the self instance data structure. See IDLffXMLSAX::StartElement for details on the method we are overriding.
The EndElement method is called when the xml_to_struct parser encounters the end of an XML element. The following routine is the definition of the EndElement method:
PRO xml_to_struct::EndElement, URI, Local, strName CASE strName of "Solar_System": "Planet": BEGIN self.Planets[self.planetNum] = self.currentPlanet self.planetNum = self.planetNum + 1 END "Orbit" : self.currentPlanet.Orbit = self.charBuffer "Period" : self.currentPlanet.Period = self.charBuffer "Moons" : self.currentPlanet.Moons= self.charBuffer ENDCASE END
As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
<Solar_System> element, we do nothing.
<Planet> element, we set the element of the Planets array specified by planetNum equal to the PLANET structure contained in currentPlanet. Then, we increment the planetNum counter.
<Orbit>, <Period>, or <Satellites> element, we place the value in the charBuffer field into the appropriate field within the PLANET structure contained in currentPlanet.See IDLffXMLSAX::EndElement for details on the method we are overriding.
| Note In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema. We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error. |
The GetArray method allows us to retrieve the array of structures stored in the Planets variable. The following routine is the definition of the GetArray method:
FUNCTION xml_to_struct::GetArray IF (self.planetNum EQ 0) THEN $ RETURN, -1 $ ELSE RETURN, self.Planets[0:self.planetNum-1] END
Here, we check to see whether the planetNum counter has been incremented. If it has been incremented, we return as the number of array elements specified by the counter. If the counter has not been incremented (indicating that no data has been stored in the array), we return the value -1.
To see the xml_to_struct parser in action, you can parse the file planets.xml, found in the examples/data subdirectory of the IDL distribution. The planets.xml file contains the fragment of XML like the one shown at the beginning of this section, and includes a <Planet> element for each planet in the solar system. The planets.xml file also includes a DTD describing the structure of the file.
Enter the following statements at the IDL command line:
xmlObj = OBJ_NEW('xml_to_struct')
xmlFile = FILEPATH('planets.xml', $
SUBDIRECTORY = ['examples', 'data'])
xmlObj->ParseFile, xmlFile
planets = xmlObj->GetArray()
OBJ_DESTROY, xmlObj
The variable planets now holds an array of PLANET structures, one for each planet. To print the number of moons for each planet, you could use the following IDL statement:
FOR i = 0, (N_ELEMENTS(planets.Name) - 1) DO $ PRINT, planets[i].Name, planets[i].Moons, $ FORMAT = '(A7, " has ", I2, " moons")'
IDL prints:
Mercury has 0 moons Venus has 0 moons Earth has 1 moons Mars has 2 moons Jupiter has 16 moons Saturn has 18 moons Uranus has 21 moons Neptune has 8 moons Pluto has 1 moons
To view all the information about the planet Mars, you could use the following IDL statement:
HELP, planets[3], /STRUCTURE
IDL prints:
** Structure PLANET, 4 tags, length=32, data length=26: NAME STRING 'Mars' ORBIT ULONG64 227940000 PERIOD FLOAT 686.980 MOONS INT 2
IDL Online Help (March 06, 2007)