Previous Application Programming: Using the XML Parser Object Class Next

Example: Reading Data Into an Array

This example subclasses the IDLffXMLSAX parser object class to create an object class named xml_to_array. The xml_to_array object class is designed to read numerical values from an XML file with the following structure:

<array>    
  <number>0</number>  
  <number>1</number>  
  ...  
</array>  

and place those values into an IDL array variable.


Note
This example is a very simple example. It is designed to illustrate how an event-based XML parser is constructed using the IDLffXMLSAX object class. An application that reads real data from an XML file will most likely be quite a bit more complicated.

Creating the xml_to_array Object Class

In order to read the XML file and return an array variable, we will need to create an object class definition that inherits from the IDLffXMLSAX object class, and override the following superclass methods: Init, Cleanup, StartDocument, Characters, StartElement, and EndElement. Since this example does not retrieve data using any of the other IDLffXMLSAX methods, we do not need to override those methods. In addition, we will create a new method that allows us to retrieve the array data from the object instance data.


Example Code
This example is included in the file xml_to_array__define.pro in the examples/doc/file_io subdirectory of the IDL distribution.

Object Class Definition

The following routine is the definition of the xml_to_array object class:

PRO xml_to_array__define  
  
void = {xml_to_array, $  
   INHERITS IDLffXMLSAX, $  
   charBuffer:'', $  
   pArray:PTR_NEW()}   
END  

The following items should be considered when defining this class structure:

Why do we store the array data in a pointer variable? Because the fields of a named structure (xml_to_array, in this case) must always contain the same type of data as when that structure was defined. Since we want to be able to add values to the data array as we parse the XML file, we will need to extend the array with each new value. If we began by defining the size of the array in the structure variable, we would not be able to extend the array. By holding the data array in a pointer, we can extend the array without changing the format of the xml_to_array object class structure.


Note
Although we describe this routine first here, the xml_to_array__define routine must be the last routine in the xml_to_array__define.pro file.

Init Method

The Init method is called when the an xml_to_array parser object is created by a call to OBJ_NEW. The following routine is the definition of the Init method:

FUNCTION xml_to_array::Init  
  self.pArray = PTR_NEW(/ALLOCATE_HEAP)  
  RETURN, self->IDLffxmlsax::Init()  
END  

We do two things in this method:


Note
The initialization task (setting the value of the pArray field) is performed before calling the superclass's Init method.

See IDLffXMLSAX::Init for details on the method we are overriding.

Cleanup Method

The Cleanup method is called when the xml_to_array parser object is destroyed by a call to OBJ_DESTROY. The following routine is the definition of the Cleanup method:

PRO xml_to_array::Cleanup  
  
IF (PTR_VALID(self.pArray)) THEN PTR_FREE, self.pArray  
  
END  

All we do in the Cleanup method is to release the pArray pointer, if it exists.

See IDLffXMLSAX::Cleanup for details on the method we are overriding.

Characters Method

The Characters method is called when the xml_to_array parser encounters character data inside an element. The following routine is the definition of the Characters method:

PRO xml_to_array::characters, data  
  
self.charBuffer = self.charBuffer + data  
  
END  

As it parses the character data in an element, the parser will read characters until it reaches the end of the text section. Here, we simply add the current characters to the charBuffer field of the object's instance data structure.

See IDLffXMLSAX::Characters for details on the method we are overriding.

StartDocument Method

The StartDocument method is called when the xml_to_array parser encounters the beginning of the XML document. The following routine is the definition of the StartDocument method:

PRO xml_to_array::StartDocument  
  
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $  
   void = TEMPORARY(*self.pArray)  
  
END  

Here, we check to see if the array pointed at by the pArray pointer contains any data. Since we are just beginning to parse the XML document at this point, it should not contain any data. If data is present, we reinitialize the array using the TEMPORARY function.


Note
Since pArray is a pointer, we must use dereferencing syntax to refer to the array.

See IDLffXMLSAX::StartDocument for details on the method we are overriding.

StartElement Method

The StartElement method is called when the xml_to_array parser encounters the beginning of an XML element. The following routine is the definition of the StartElement method:

PRO xml_to_array::startElement, URI, local, strName, attr, value  
  
CASE strName OF  
   "array": BEGIN  
      IF (N_ELEMENTS(*self.pArray) GT 0) THEN $  
      void = TEMPORARY(*self.pArray);; clear out memory  
   END  
   "number" : BEGIN  
      self.charBuffer = ''  
   END  
ENDCASE  
  
END  

Here, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See IDLffXMLSAX::StartElement for details on the method we are overriding.

EndElement Method

The EndElement method is called when the xml_to_array parser encounters the end of an XML element. The following routine is the definition of the EndElement method:

PRO xml_to_array::EndElement, URI, Local, strName  
  
CASE strName OF  
   "array":  
   "number": BEGIN  
      idata = FIX(self.charBuffer);  
      IF (N_ELEMENTS(*self.pArray) EQ 0) THEN $  
         *self.pArray = iData $  
      ELSE $  
         *self.pArray = [*self.pArray,iData]  
   END  
ENDCASE   
  
END  

As with the StartElement method, we first check the name of the element we have encountered, and use a CASE statement to branch based on the element name:

See IDLffXMLSAX::EndElement for details on the method we are overriding.


Note
In both the StartElement and EndElement methods, we rely on the validity of the XML data file. Our CASE statements only need to handle the element types described in the XML file's DTD or schema (in this case, the only elements are <array> and <number>). We do not need an ELSE clause in the CASE statement. If an unknown element is found in the XML file, the parser will report a validation error.

GetArray Method

The GetArray method allows us to retrieve the array data stored in the pArray pointer variable. The following routine is the definition of the GetArray method:

FUNCTION xml_to_array::GetArray  
  
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $  
   RETURN, *self.pArray $  
ELSE RETURN , -1  
  
END  

Here, we check to see whether the array pointed at by pArray contains any data. If it does contain data, we return the array. If the array contains no data, we return the value -1.

Using the xml_to_array Parser

To see the xml_to_array parser in action, you can parse the file num_array.xml, found in the examples/data subdirectory of the IDL distribution. This num_array.xml file contains the fragment of XML like the one shown in the beginning of this section, and includes 20 extra <number> elements. The num_array.xml file also includes a DTD describing the structure of the file.

Enter the following statements at the IDL command line:

xmlObj = OBJ_NEW('xml_to_array')  
xmlFile = FILEPATH('num_array.xml', $  
   SUBDIRECTORY = ['examples', 'data'])  
xmlObj->ParseFile, xmlFile  
myArray = xmlObj->GetArray()  
OBJ_DESTROY, xmlObj  
HELP, myArray  
PRINT, myArray  

IDL prints:

MYARRAY         INT       = Array[20]  
 0   1   2   3   4   5   6   7   8   9   10   11  
12  13  14  15  16  17  18  19  

  IDL Online Help (March 06, 2007)