XML parsing using DOM, SAX and StAX perser in Java


In this article, I will explain about what is XML parser and its type, Detail about JAXP, SAX parser with version history, DOM paser with version history, StAX parser and why it is different from SAX parser and finaly difference between SAX and DOM parser.

What is XML parser?

XML parser is a software library (or package) that reads and parses XML. It is designed to read XML and create a way for programs. Different types of XML persers are available for various languages like: C++, Java, C#, Perl, Python, PHP, Ruby etc.

XML persing

Types of XML parsers in Java

  1. SAX(Simple API for XML): It is a event driven API. It sends events to the application as a document is parsed
  2. DOM(Document Object Model): It reads the entire data into memory in a tree structure

JAXP

JAXP stands for Java API for XML Processing. It is available as a seperate library or as a part of Sun JDK 1.4 (javax.xml.parsers package). It is an independent way of writing java code for XML. It supports the SAX and DOM parser API's and the XSLT standard. Also it allows to plug in implementation of the parser or the processor. By default it uses reference implementations(crimson and xalan)

SAX: Simple API for XML

It is an event driven processing of XML documents, it implements SAX API. it reads each unit of XML, it creates an event which can be used by calling program. It is used for high performance applications or areas where XML size might exceed the available momory. SAX reads the XML documents as a stream of XML tags(starting elements, ending elements, text sections etc). Programmer decides what to do with every event. SAX parser does not create any object at all, it simply delivers events. It provides interfaces on handlers. The design specification and co-ordination was done by Dave Megginson. Current SAX stadard is version 2.0.
There are four handler interfaces: ContentHandler interface, DTDHandler interface, EntityResolver interface and ErrorHandler interface.

Version hisory:
  • SAX 1 - introduced in May 1998
  • SAX 2.0 - inoduced in May 2000 and supports for namespace, filter chains and query/setting properties in the parser

Advantages:
  • It is simple and efficient
  • It is familier with stream application

Disadvantages:
  • Here data is broken and client never have all the information

DOM: Document Object Model

DOM parser is a tree based API, it implements DOM API. It is different from SAX perser because it builds the entire XML representaion in memory and used a as a whole. It can be very memory related. Current DOM stadard is version level 3.0. It provides interfaces on components of a tree such as: Document interface, Node interface, NodeList interface, Element interface, Attr interface etc. With DOM parcer, method calls in client application have to be explicit and forms like a chained method calls.

Version hisory:
  • DOM 1 - introduced in late 1998
  • DOM 2 - published in late 2000DOM 3 - released in April 2004
  • DOM 4 - released in February 2014

Advantages:
  • It is useful when random access to separeted parts of a documents are requiredIt supports read and write operators

Disadvantages:
  • Memory is insufficient
  • It is somehow complicated

StAX: Streaming API for XML

StAX pull perser has been implemented in Java by supporting JSP-173. This parser pulls the required data from XML. It can give a significance performance improve. It maintains a cursor at the current position.

Difference between SAX parser and StAX parser:
  1. The SAX parser pushes the data whereas StAX parser pulls the data from XML
  2. StAX parser maintains a cursor at the current position to exract the content available at the cursor but SAX parser issues event when some certain data is encountered

Can DOM and SAX parsers be used at the same time?

Yes we can, because the use of a DOM and a SAX parser is independent.

Difference between DOM and SAX parser

  1. DOM parser parses entire document whereas SAX parser parses until tell it to stop
  2. DOM is good for reading data and configuration files but SAX is good for very large documents
  3. DOM is more useful when you need to modify the document but SAX is not
  4. SAX does not remember previous events unless write explicit code to do so

Conclusion

The choice among SAX, DOM and StAX for any given application is the matter of testing. It checks whether the document well formatted and validated or not. It also parses data to the invoking application.


Related Articles

Creating new JEE 5 applications (Java and EJB Web Services) using RAD 7.5

Now days, most of the enterprise applications are developed in RAD 7.0+ (Rational Application Developer). The Web service application can be written in Java or in EJB. For creating Web services in Java or EJB, the vast documentation is available. Referring such documentation during speedy development period becomes a hectic job. So to avoid it, this article will help to easily develop a JEE5 application using simple steps.

Some Basics Java Interview Questions

This is the list of JAVA FAQ's that are frequently asked in most of the interviews of an IT company.By going through these questions and answers not only you will qualify in the interviews but you will also have the overall basic knowledge of java.

Program Efficiency

The program efficiency is determined by the two resources - execution time and memory requirement.

Interview focused Java Object Oriented questions for IT

This article emphasizes on Interview Questions that are asked while interviewing candidates who are eligible for IT jobs. Object Oriented concepts often cause a problem for the candidates and in this article, Object Oriented Interview questions are only mentioned. One, who is perfect in these concepts can easily break through the Interviews with B.Tech Qualification. I mention B.Tech because these questions are designed as per the syllabus of B.Tech.

More articles: Java basics Java JAVA programs Java Interview Questions

Comments

No responses found. Be the first to comment...


  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:
    Email: