Parsing the iTunes Library using a SAX parser (in Java)

The Question

A reader nicknamed "Aviator" asked for help parsing the iTunes Library XML file using a SAX parser. He did most of the work implementing the parser, so I'll simply build on his sample code, which you can see in the comments here.

The Answer

SAX Parsers

First, let's step back and describe a SAX parser. The SAX parser is an approach to parsing an XML document that uses "handlers", or simple callback methods, to mark when the parser encounters specific parsing events, such as the start or end of an XML tag. SAX parsers are considered lightweight because they only know about parsing events, and typically only parse into memory the portion of an XML document (the document fragment) that corresponds to the tag being parsed. Contrast this approach with that of the DOM parser which reads the whole document into memory before working on it.

Apple's XML Plist format

It's also helpful to take a look at Apple's XML plist format, the file format of the iTunes XML file. The plist format goes way back to the NeXT computer platform. It is basically a key-value representation of data, in which each key corresponds to a value in the document.

For example, consider this very simple (old style) plist:

{
    name = "Fred";
}

If you squint a little, it looks a lot like another common format, JSON.

The XML plist format came along later, but it retains the same basic format of a key-value representation, with a bit of extra sugar sprinkled on top. This makes it a little weird to work with.

In traditional XML representations, you might expect the following:

<name="Fred" />

or

<name>Fred</name>

However, in Apple's XML plist format, you would see:

<dict>
    <key>name</key><string>Fred</string>
</dict>

This throws a curve ball at a SAX parser, because it traditionally only knows about the "current" tag. Handling this is where Aviator's code fell short. Fortunately, it's easy enough to teach the parser a few tricks by keeping some local state. We just need to keep a reference to the previous tag and tag value in order to provide the necessary context to understand the current tag's meaning. A more complete parser would probably maintain a stack of the previous tags/values for to fully represent the parser's current context, but a simple reference to the previous tag and value is enough for our needs.

The code

Below is my (slight) reworking of Aviator's code. It will successfully parse the songs out of the iTunes XML Library file.

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

/**
 * A sample iTunes Library file SAX parser.
 */
public class SAXParserExample extends DefaultHandler {

    private static final String LIBRARY_FILE_PATH = "/tmp/iTunes Music Library.xml"; //"C:\\iTunes Music Library.xml";

    List<Song> myTracks;

    private String tempVal;

    //to maintain context
    private Song tempTrack;

    boolean foundTracks = false;

    private String previousTag;
    private String previousTagVal;

    public SAXParserExample() {
        myTracks = new ArrayList<Song>();
    }

    public void runExample() {
        parseDocument();
        printData();
    }

    private void parseDocument() {
        //get a factory
        SAXParserFactory spf = SAXParserFactory.newInstance();
        try {
            //get a new instance of parser
            SAXParser sp = spf.newSAXParser();

            //parse the file and also register this class for call backs
            sp.parse(LIBRARY_FILE_PATH, this);

        }catch(SAXException se) {
            se.printStackTrace();
        }catch(ParserConfigurationException pce) {
            pce.printStackTrace();
        }catch (IOException ie) {
            ie.printStackTrace();
        }
    }

    /**
     * Iterate through the list and print
     * the contents
     */
    private void printData(){

        System.out.println("No of Tracks '" + myTracks.size() + "'.");

        Iterator<Song> it = myTracks.iterator();

        while(it.hasNext()) {
            Song song = it.next();
            System.out.println(song.getAlbum() + " - " + song.getName());
        }
    }

    //Event Handlers
    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        //reset
        tempVal = "";

        if (foundTracks) {
            if ("key".equals(previousTag) && "dict".equalsIgnoreCase(qName)) {
                //create a new instance of employee
                tempTrack = new Song();
                myTracks.add(tempTrack);
            }
        } else {
            if ("key".equals(previousTag) && "Tracks".equalsIgnoreCase(previousTagVal) && "dict".equalsIgnoreCase(qName)) {
                foundTracks = true; // We are now inside the Tracks dict.
            }
        }
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        tempVal = new String(ch,start,length);
    }

    public void endElement(String uri, String localName, String qName) throws SAXException {
        if (foundTracks) {
            if (previousTagVal.equalsIgnoreCase("Name") && qName.equals("string"))
            {
                    tempTrack.setName(tempVal);
            }
            else if (previousTagVal.equalsIgnoreCase("Artist") && qName.equals("string"))
            {
                    tempTrack.setArtist(tempVal);
            }
            else if (previousTagVal.equalsIgnoreCase("Album") && qName.equals("string"))
            {
                    tempTrack.setAlbum(tempVal);
            }
            else if (previousTagVal.equalsIgnoreCase("Play Count") && qName.equals("integer"))
            {
                    Integer value = Integer.parseInt(tempVal);
                    tempTrack.setPlayCount(value.intValue());
            }

            // Mark when we come to the end of the "Tracks" dict.
            if ("key".equals(qName) && "Playlists".equalsIgnoreCase(tempVal)) {
                foundTracks = false;
            }
        }

        // Keep track of the previous tag so we can track the context when we're at the second tag in a key, value pair.
        previousTagVal = tempVal;
        previousTag = qName;
    }

    /**
     * A simple representation of a song in the iTunes library.
     */
    public class Song {

        private String name;
        private String artist;
        private String album;

        private int playCount;

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public String getArtist() {
            return artist;
        }

        public void setArtist(String artistName) {
            this.artist = artistName;
        }

        public String getAlbum() {
            return album;
        }

        public void setAlbum(String albumName) {
            this.album = albumName;
        }

        public int getPlayCount() {
            return playCount;
        }

        public void setPlayCount(int playCount) {
            this.playCount = playCount;
        }
    }

    public static void main(String[] args) {
        SAXParserExample spe = new SAXParserExample();
        spe.runExample();
    }

}

Comments

  1. At March 31, 2010 @ 11:32 p.m. Aviator(Rishupreet Oberoi) said:

    Nicely put Travis!
    Fortunately i managed to do almost the same you mentioned there and parse by storing previous value of tag encountered.
    Here is my code:-
    public class ItunesParser extends DefaultHandler{
    Map<String,MUDTrackBean> tracksMap = null;
    private String current_tag;
    private String prev_val;
    private StringBuilder tempVal = new StringBuilder();
    private MUDTrackBean tempTrack;
    private String trackid;
    private boolean isParentArray=false;
    private String fileLoc;

    public Map<String,MUDTrackBean> itunesParser(String fileLoc) throws SAXException {

    tracksMap = new HashMap<String,MUDTrackBean>();
    this.fileLoc = fileLoc;
    parseDocument();

    //printData();
    return tracksMap;
    }
    private void parseDocument() throws SAXException {

    try{
    SAXParserFactory spf = SAXParserFactory.newInstance();
    spf.setValidating(true);
    spf.setFeature("http://apache.org/xml/features/validation/schema", true);

    SAXParser sp = spf.newSAXParser();
    sp.parse(fileLoc, this);
    }catch(ParserConfigurationException pce) {
    pce.printStackTrace();
    }catch (IOException ie) {
    ie.printStackTrace();
    }
    }
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    current_tag = qName.trim();
    prev_val = tempVal.toString();
    tempVal = new StringBuilder();

    if(qName.equalsIgnoreCase("array")) {
    isParentArray = true;
    }
    }
    public void characters(char[] ch, int start, int length) throws SAXException {
    if(current_tag.equalsIgnoreCase("string")) {
    tempVal.append(new String(ch, start, length));
    } else {
    tempVal = new StringBuilder(new String(ch, start, length));
    }

    }
    public void endElement(String uri, String localName, String qName) throws SAXException {
    String tempValStr=tempVal.toString();
    if(qName.equalsIgnoreCase("dict") && !isParentArray)
    {
    if(tempTrack!=null)
    {
    if(tempTrack.getTrackid()!=null)
    {
    tracksMap.put(trackid, tempTrack);
    }
    }
    }

    else if (qName.equalsIgnoreCase("key") && !isParentArray)
    {

    if(tempValStr.equalsIgnoreCase("Track ID"))
    tempTrack = new MUDTrackBean();
    }
    else if(qName.equals("integer") && prev_val.equalsIgnoreCase("Track ID") && !isParentArray)
    {
    trackid = tempValStr;
    tempTrack.setTrackid(trackid);
    }
    else if(qName.equals("string") && prev_val.equalsIgnoreCase("Name") && !isParentArray)
    {
    tempTrack.setTrackTitle(tempValStr);
    }
    else if (qName.equalsIgnoreCase("string") && prev_val.equalsIgnoreCase("Artist") && !isParentArray)
    {
    tempTrack.setArtist(tempValStr);
    }
    else if (qName.equalsIgnoreCase("string") && prev_val.equalsIgnoreCase("Album") && !isParentArray)
    {
    tempTrack.setAlbum(tempValStr);
    }
    else if (qName.equalsIgnoreCase("integer") && prev_val.equalsIgnoreCase("Play Count") && !isParentArray)
    {
    tempTrack.setPlayCount(Integer.parseInt(tempValStr));
    }
    }

    }

    Regards
    Rishupreet Oberoi (Aviator)

  2. At April 27, 2010 @ 1:37 p.m. Lucas said:

    I found it fascinating! but i'm having some issues... like anpersands in song titles... the parser can't get it?

  3. At June 10, 2010 @ 11:55 p.m. Zach Caraher said:

    yo can you put up a download link for biddang by ratatat? i pre-ordered the album but its not downloading for some reason. can u email me at bigzremixes@gmail.com

    thanks a lot,
    zach

  4. At June 21, 2010 @ 12:51 p.m. Travis Cripps said:

    Zach,

    I suggest you get in touch with the iTunes support people about your missing preorder track. Sorry.

  5. At Feb. 1, 2011 @ 5:08 a.m. Moti said:

    Hi and thank for this post its help me a lot.
    actually i wrote my application in C++ (using QT - cross platform framework)
    my question is: how do you handle localization string in the Local field (i.e. if i have mp3 file located in a folder with France/Hebrew/Arabic.... chars this folder is present in the iTunes music library.xml as not readable chars like '/myMusic/%D7%A9%D7%9C%D7%95%D7%9D%20%D7%97%D7%A0%D7%95%D7%9A%20-%20%D7%94%D7%9E%D7%99%D7%98%D7%91/%D7%A9%D7%9C%D7%95%D7%9D%20%D7%97%D7%A0%D7%95%D7%9A%20-%20%20%D7%9B%D7%9B%D7%94%20%D7%95%D7%9B%D7%9B%D7%94.mp3')

  6. At Feb. 10, 2011 @ 5:49 a.m. Mits said:

    Very useful code indeed. Lucas, the problem you have can be solved with the string method String.replace("&","\&"); when you read something you want to use. The backslash will add this like just a string. Thanks again for the really usefull code!!

  7. At June 13, 2011 @ 4:28 p.m. Gil said:

    I tried this myself. The SAX parser threw an error.

    It turns out that the iTunes XML is not well-formed. It contains two root elements.

    The first root element is this: <plist version="1.0">
    It is never closed. The second element is the "proper" <dict> root element.

    How did you get this to run without encountering an error?

Have any thoughts about this post? Add your comment.