Node.js sync vs. async


I'm currently learning node.js and I see 2 examples for sync and asycn program (same one).

I do understand the concept of a callback, but i'm trying to understand the benefit for the second (async) example, as it seems that the two of them are doing the exact same thing even though this difference...

Can you please detail the reason why would the second example be better? I'll be happy to get an ever wider explanation that would help me understand the concept..

Thank you!!

1st example:

var fs = require('fs');

function calculateByteSize() {
    var totalBytes = 0,
        i,
        filenames,
        stats;
    filenames = fs.readdirSync(".");
    for (i = 0; i < filenames.length; i ++) {
        stats = fs.statSync("./" + filenames[i]);
        totalBytes += stats.size;
    }
    console.log(totalBytes);
}

calculateByteSize();

2nd example:

var fs = require('fs');

var count = 0,
    totalBytes = 0;

function calculateByteSize() {
    fs.readdir(".", function (err, filenames) {
        var i;
        count = filenames.length;

        for (i = 0; i < filenames.length; i++) {
            fs.stat("./" + filenames[i], function (err, stats) {
                totalBytes += stats.size;
                count--;
                if (count === 0) {
                    console.log(totalBytes);
                }
            });
        }
    });
}

calculateByteSize();

Your first example is all blocking I/O. In other words, you would need to wait until the readdir operation is complete before looping through each file. Then you would need to block (wait) for each individual sync stat operation to run before moving on to the next file. No code could run after calculateByteSize() call until all operations are completed.

The async (second) example on the otherhand is all non-blocking using the callback pattern. Here, the execution returns to just after the calculateByteSize() call as soon as fs.readdir is called (but before the callback is run). Once the readdir task is complete it performs a callback to your anonymous function. Here it loops through each of the files and again does non-blocking calls to fs.stat.

The second is more advantageous. If you can pretend that calls to readdir or stat can range from 250ms to 750ms to complete (this is probably not the case), you would be waiting for serial calls to your sync operations. However, the async operations would not cause you to wait between each call. In other words, looping over the readdir files, you would need to wait for each stat operation to complete if you were doing it synchronously. If you were to do it asynchronously, you would not have to wait to call each fs.stat call.


In your first example, the node.js process, which is single-threaded, is blocking for the entire duration of your readdirSync, and can't do anything else except wait for the result to be returned. In the second example, the process can handle other tasks and the event loop will return it to the continuation of the callback when the result is available. So you can handle a much much higher total throughput by using asynchronous code -- the time spent waiting for the readdir in the first example is probably thousands of times as long as the time actually spend executing your code, so you're wasting 99.9% or more of your CPU time.


In your example the benefit of async programming is indeed not much visible. But suppose that your program needs to do other things as well. Remember that your JavaScript code is running in a single thread, so when you chose the synchronous implementation the program can't do anything else but waiting for the IO operation to finish. When you use async programming, your program can do other important tasks while the IO operation runs in the background (outside the JavaScript thread)


Can you please detail the reason why would the second example be better? I'll be happy to get an ever wider explanation that would help me understand the concept..

It's all about concurrency for network servers (thus the name "node"). If this were in a build script the second, synchronous example would be "better" in that is is more straightforward. And given a single disk, there might not be much actual benefit to making it asynchronous.

However, in a network service, the first, synchronous version would block the entire process and defeat node's main design principle. Performance would be slow as number of concurrent clients increased. However the second, asynchronous example, would perform relatively well as while waiting for the relatively-slow filesystem to come back with results, it could handle all the relatively-fast CPU operations concurrently. The async version should basically be able to saturate your filesystem and however much your filesystem can deliver, node will be able to get it out to clients at that rate.


Lots of good answers here, but be sure to also read the docs:

The synchronous versions will block the entire process until they complete--halting all connections.

There is a good overview of sync vs async in the documentation: http://nodejs.org/api/fs.html#fs_file_system