Javascript getElementById lookups - hash map or recursive tree traversal?


Does the DOM have a hash-table of elements whose keys are the elements' ids?
I want to know if document.getElementById looks up a hash table or traverses the entire tree.
Is this behavior different across browsers?

Implementations are free to do whatever they like, but since id is defined as a unique value, it would seem sensible to use a hash map or similiar lookup mechanism rather than traversal. What seems sensible from the outside, though, may not be when you get into the plumbing of building a complex web browser with many (sometimes conflicting) imperatives.

I did an easy but very simplistic test (see page at end of answer). It's very simplistic not least because we don't know that browsers don't cache previous results.

Chrome 4.1.249.1059 reports:

ID: 0.0082ms per lookup
Tag: 0.0249ms per lookup

So, dramatically faster by ID than tag name.

IE7 reports:

ID: 2.4438ms per lookup
Tag: 0.0437ms per lookup

So dramatically faster by tag name than ID (but remember IE7 has a broken concept of getElementById; this is fixed in IE8).

IE8 (on a different machine, don't compare absolutes, we're looking at diffs within the browser tested) reports:

ID: 1.1335ms per lookup
Tag: 0.0287ms per lookup

So about the same as IE7.

Firefox 3.6.3 reports:

ID: 0.0042ms per lookup
Tag: 0.006ms per lookup

So it doesn't care that much (on repeated requests; again, it may be caching).

Opera 10.5.1 reports:

ID: 0.006ms per lookup
Tag: 1.467ms per lookup

Dramatically faster by ID than tag name.

Make of those results what you will. A more complex test case would be needed to really infer the mechanisms. Of course, in at least two of those cases (Firefox and Chrome), we can go look at the source. CMS kindly points us to the WebKit and Firefox implementations (and looking at it, my suspicion about caching may have been on the money).

Test page:

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
    font-family: sans-serif;
}
#log p {
    margin:     0;
    padding:    0;
}
</style>
<script type='text/javascript'>
window.onload = pageInit;
function pageInit() {
    document.getElementById('btnGo').onclick = btnGoClick;
}
function btnGoClick() {

    log("Testing...");
    setTimeout(run, 0);
}

function run() {
    var count, time;

    try {
        // Warm up
        testid(10);
        testtag(10);

        // Test
        count = 10000
        time = testid(count);
        log("ID: " + (time / count) + "ms per lookup");
        time = testtag(count);
        log("Tag: " + (time / count) + "ms per lookup");
    }
    catch (e) {
        log("Error: " + (e.message ? e.message : String(e)));
    }
}

function testid(count) {
    var start;

    start = new Date().getTime();
    while (count-- > 0) {
        if (!document.getElementById('fred')) {
            throw "ID 'fred' not found";
        }
    }
    return new Date().getTime() - start;
}

function testtag(count) {
    var start;

    start = new Date().getTime();

    while (count-- > 0) {
        if (document.getElementsByTagName('em').length == 0) {
            throw "Tag 'em' not found";
        }
    }
    return new Date().getTime() - start;
}

function log(msg) {
    var log = document.getElementById('log');
    log.innerHTML += "<p>" + msg + "<\/p>";
}
</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go'>
<div id='log'></div>
<hr>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<!-- repeat the above a couple of thousand times; I had about 2,200 -->
<div>test test<span>test<span>test<span>test<span>test</span></span></span></span></div>
<div>test test<span>test<span>test<span>test<em id='fred'>test</em></span></span></span></div>
</div></body>
</html>

I know about the Firefox and WebKit DOM implementations, both of them use a hashtable to lookup the elements, digging into the source of them you can give a look to the internal implementations:

WebKit implementation, Document.cpp, uses the hashtable if the id is unique, otherwise it traverses the document to get the first match:

Element* Document::getElementById(const AtomicString& elementId) const
{
    if (elementId.isEmpty())
        return 0;

    m_elementsById.checkConsistency();

    Element* element = m_elementsById.get(elementId.impl());//<-- hastable lookup
    if (element)
        return element;

    if (m_duplicateIds.contains(elementId.impl())) {
        // We know there's at least one node with this id,
        // but we don't know what the first one is.
        for (Node *n = traverseNextNode(); n != 0; n = n->traverseNextNode()) {
            if (n->isElementNode()) {
                element = static_cast<Element*>(n);
                if (element->hasID() &&
                element->getAttribute(element->idAttributeName()) == elementId) {
                    m_duplicateIds.remove(elementId.impl());
                    m_elementsById.set(elementId.impl(), element);
                    return element;
                }
            }
        }
        ASSERT_NOT_REACHED();
    }
    return 0;
}

Firefox implementation, nsDocument.cpp


The specific implementation isn't defined in the HTML spec so it could easily vary browser to browser. For example IE documentation states

Returns a reference to the first object with the specified value of the ID or NAME attribute.

so I'd be tempted to say it does a search (or it just throws out elements in cases of hash collisions).

EDIT Also keep in mind there are other data structures (like trees) that allow for access time somewhere between constant and linear.


It shouldn't be hard to test.

If it's tree-based then making a very deep tree (via Javascript) should be a good test case.