|
Application Programming: Using the XML Parser Object Class |
|
This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_array. The xml_to_array object class is designed to read numerical values from an XML file with the following structure:
<array> <number>0</number> <number>1</number> ... </array>
and place those values into an IDL array variable.
| Note This example is a very simple example. It is designed to illustrate how an event-based XML parser is constructed using the IDLffXMLSAX object class. An application that reads real data from an XML file will most likely be quite a bit more complicated. |
In order to read the XML file and return an array variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Cleanup, StartDocument, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the array data from the object instance data.
| Example Code This example is included in the file xml_to_array__define.pro in the examples/doc/file_io subdirectory of the IDL distribution. |
The following routine is the definition of the xml_to_array object class:
PRO xml_to_array__define
void = {xml_to_array, $
INHERITS IDLffXMLSAX, $
charBuffer:'', $
pArray:PTR_NEW()}
END
The following items should be considered when defining this class structure:
charBuffer structure field is set equal to an empty string.
pArray structure field is set equal to an IDL pointer. We will use this pointer to store the numerical array data we retrieve.
__define" (note the two underscore characters) to the class name.
Why do we store the array data in a pointer variable? Because the fields of a named structure (xml_to_array, in this case) must always contain the same type of data as when that structure was defined. Since we want to be able to add values to the data array as we parse the XML file, we will need to extend the array with each new value. If we began by defining the size of the array in the structure variable, we would not be able to extend the array. By holding the data array in a pointer, we can extend the array without changing the format of the xml_to_array object class structure.
| Note Although we describe this routine first here, the xml_to_array__define routine must be the last routine in the xml_to_array__define.pro file. |
The Init method is called when the an xml_to_array parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:
FUNCTION xml_to_array::Init self.pArray = PTR_NEW(/ALLOCATE_HEAP) RETURN, self->IDLffxmlsax::Init() END
We do two things in this method:
pArray field of the class structure variable.
| Note Within a method, we can refer to the class structure variable with the implicit parameter self. Remember that self is actually a reference to the xml_to_array object instance. |
Init method, called on the self object reference.
| Note The initialization task (setting the value of the pArray field) is performed before calling the superclass's Init method. |
See IDLffXMLSAX::Init for details on the method we are overriding.
The Cleanup method is called when the xml_to_array parser object is destroyed by a call to OBJ_DESTROY. The following routine is the definition of the Cleanup method:
PRO xml_to_array::Cleanup IF (PTR_VALID(self.pArray)) THEN PTR_FREE, self.pArray END
All we do in the Cleanup method is to release the pArray pointer, if it exists.
See IDLffXMLSAX::Cleanup for details on the method we are overriding.
The Characters method is called when the xml_to_array parser encounters character data inside an element. The following routine is the definition of the Characters method:
PRO xml_to_array::characters, data self.charBuffer = self.charBuffer + data END
As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.
See IDLffXMLSAX::Characters for details on the method we are overriding.
The StartDocument method is called when the xml_to_array parser encounters the beginning of the XML document. The following routine is the definition of the StartDocument method:
PRO xml_to_array::StartDocument IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ void = TEMPORARY(*self.pArray) END
Here, we check to see if the array pointed at by the pArray pointer contains any data. Since we are just beginning to parse the XML document at this point, it should not contain any data. If data is present, we reinitialize the array using the TEMPORARY function.
| Note Since pArray is a pointer, we must use dereferencing syntax to refer to the array. |
See IDLffXMLSAX::StartDocument for details on the method we are overriding.
The StartElement method is called when the xml_to_array parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:
PRO xml_to_array::startElement, URI, local, strName, attr, value CASE strName OF "array": BEGIN IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ void = TEMPORARY(*self.pArray);; clear out memory END "number" : BEGIN self.charBuffer = '' END ENDCASE END
Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
<array> element, we check to see if the array pointed at by the pArray pointer is empty. Since we are just beginning to read the array data at this point, there should be no data. If data already exists, we reinitialize the array using the TEMPORARY function.
<number> element, we reinitialize the charBuffer field. Since we are just beginning to read the number data, nothing should be in the buffer.See IDLffXMLSAX::StartElement for details on the method we are overriding.
The EndElement method is called when the xml_to_array parser encounters the end of an XML element. The following routine is the definition of the EndElement method:
PRO xml_to_array::EndElement, URI, Local, strName CASE strName OF "array": "number": BEGIN idata = FIX(self.charBuffer); IF (N_ELEMENTS(*self.pArray) EQ 0) THEN $ *self.pArray = iData $ ELSE $ *self.pArray = [*self.pArray,iData] END ENDCASE END
As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:
<array> element, we do nothing.
<number> element, we must get the data stored in the charBuffer field of the instance data structure and place it in the array:charBuffer into an IDL integer.
pArray is empty. If it is empty, we simply set the array equal to the data value we retrieved from the charBuffer.
pArray is not empty, we redefine the array to include the new data retrieved from the charBuffer.See IDLffXMLSAX::EndElement for details on the method we are overriding.
| Note In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema (in this case, the only elements are <array> and <number>). We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error. |
The GetArray method allows us to retrieve the array data stored in the pArray pointer variable. The following routine is the definition of the GetArray method:
FUNCTION xml_to_array::GetArray IF (N_ELEMENTS(*self.pArray) GT 0) THEN $ RETURN, *self.pArray $ ELSE RETURN , -1 END
Here, we check to see whether the array pointed at by pArray contains any data. If it does contain data, we return the array. If the array contains no data, we return the value -1.
To see the xml_to_array parser in action, you can parse the file num_array.xml, found in the examples/data subdirectory of the IDL distribution. This num_array.xml file contains the fragment of XML like the one shown in the beginning of this section, and includes 20 extra <number> elements. The num_array.xml file also includes a DTD describing the structure of the file.
Enter the following statements at the IDL command line:
xmlObj = OBJ_NEW('xml_to_array')
xmlFile = FILEPATH('num_array.xml', $
SUBDIRECTORY = ['examples', 'data'])
xmlObj->ParseFile, xmlFile
myArray = xmlObj->GetArray()
OBJ_DESTROY, xmlObj
HELP, myArray
PRINT, myArray
IDL prints:
MYARRAY INT = Array[20] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
IDL Online Help (March 06, 2007)