Synchronous File IO in Node.js

Posted by Dave Eddy on 26 Mar 2013

Does calling fs.writeFileSync trigger a synchronous write to the file system?

If you are familiar with Node.js, or have at least heard of it, you've most likely heard that it uses non-blocking IO, and lets you do work asynchronously. One of the most basic APIs that Node provides is for the file system; With this API, you can read, write, remove, etc. files and do other file system related tasks and modifications.

This API follows a standard pattern of exposing 2 functions for each operation: one for asynchronous work, and the other for synchronous work. For example, if you want to read a file in Node you can do so asynchronously:

var fs = require('fs');
fs.readFile('/etc/passwd', function(err, buf) {
  console.log(buf.toString());
});

Node will continue executing any javascript code it encounters while reading the file. Once all javascript is done being executed and the file is ready, it will run the anonymous function and print the file contents.

You can do the same task synchronously:

var fs = require('fs');
var contents = fs.readFileSync('/etc/passwd').toString();
console.log(contents);

In this example, contents will be set to the contents of the file, and no javascript code will be executed while the file is being read.

The first approach is done asynchronously, and will return immediately to not block your code from running. The second is done synchronously, and will halt execution until the task has completed. The same 2 types of functions exist for writing, renaming, deleting, etc. files.

Synchronous Writes

So the question is, does calling fs.writeFileSync actually trigger a synchronous write to the file system? In the userland Node process, it's synchronous in the sense that execution of any javascript is halted, but what about in the Kernel? An asynchronous write is a very different thing from a synchronous write to a file system.

For the rest of this blog post I'll be speaking within the context of the Illumos Kernel, and the ZFS File System.

There are a couple ways to answer this question. The most obvious way is to pull the Node.js source code, find the functions that talk to the file system that fs.js uses, and see how they are called. I haven't done much work on the Node core, and know it could (and most likely would) take a long time to find the code I was looking for. Instead, I'll just use DTrace to answer the question, and see exactly what Node is doing.

DTrace to the Rescue

I wrote a couple test programs that exercise these file system functions. Using DTrace, we'll be able to see what flags a file is opened with, which will show whether the operations are synchronous or not.

fs.writeFile()

// writefile.js
var fs = require('fs');
fs.writeFile('/tmp/fs.tmp', '', function(err) {
  if (err) throw err;
});

This script exercises Node's asynchronous file writing mechanism. Using DTrace, we can print the flags that were passed to open(2) for that specific file. Then, using fileflags, we can turn that decimal into the symbolic names that make up the decimal (see open(2) for more information).

$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c 'node writefile.js'
open64: 769
$ fileflags 769
  769: O_WRONLY|O_CREAT|O_TRUNC

The first command tells DTrace to run node writefile.js, and look for any of the open family of syscalls. If the first argument to open (the pathname) matches the file we are writing to, print out the exact syscall fired, and the flags decimal.

It turns out that open64(2) was called for our file, given the following options.

  • O_WRONLY: open write-only
  • O_CREAT: create the file if it doesn't exist
  • O_TRUNC: truncate the file

Fairly standard options to open a file. Since none of the options are for synchronous IO (O_SYNC, O_DSYNC, etc.) this file write is asynchronous to ZFS, and the call to write(2) returns before the data is guaranteed to be sitting on stable storage.

Node's asynchronous fs.writeFile does indeed do an asynchronous write to the file system.

fs.writeFileSync()

So what about Node's synchronous file writing mechanism, is it an actual synchronous write to the file system?

// writefilesync.js
var fs = require('fs');
fs.writeFileSync('/tmp/fs.tmp', '');

This script will block the event loop while the data is written to the file (or so we think), as it uses Node's synchronous file writing mechanism.

$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c 'node writefilesync.js'
open64: 769
$ fileflags 769
  769: O_WRONLY|O_CREAT|O_TRUNC

Same commands as above, and the same output.

Node's fs.writeFileSync does NOT initiate a synchronous write to the file system.

From the perspective of a Node program, we know the same thing when a call to fs.writeFileSync returns, as we know when the callback to fs.writeFile is fired. We know the underlying call, write(2) has returned; We do NOT know that the data has made it to stable storage. The only difference then, is that one function blocks Node's event loop, while the other allows it to continue processing events.

fs.createWriteStream()

Another mechanism that allows file IO is to create, and write to, a Node WritableStream.

// writestream.js
var fs = require('fs');
fs.createWriteStream('/tmp/fs.tmp');
$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c 'node writestream.js'
open64: 769
$ fileflags 769
  769: O_WRONLY|O_CREAT|O_TRUNC

Same output as above, again. This mechanism opens the file with the same flags as both fs.writeFile and fs.writeFileSync.

fs.appendFile()

So writing to a file uses the same flags for opening the file, what about appending? Same drill as above

// apendfile.js
var fs = require('fs');
fs.appendFile('/tmp/fs.tmp', '', function(err) {
  if (err) throw err;
});
$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c 'node appendfile.js'
open64: 265
$ fileflags 265
  265: O_WRONLY|O_APPEND|O_CREAT

So the flags are different, that's a good sign. O_TRUNC has been swapped out for O_APPEND, since we are no longer truncating the file to 0 bytes and instead are appending to it.

Again, like all the commands above, fs.appendFile opens the file for asynchronous IO.

fs.appendFileSync()

Last but not least let's test out the synchronous version of appendFile.

// appendfilesync.js
var fs = require('fs');
fs.appendFileSync('/tmp/fs.tmp', '');
$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c 'node appendfilesync.js'
open64: 265
$ fileflags 265
  265: O_WRONLY|O_APPEND|O_CREAT

Same as fs.appendFile; the file is NOT opened for synchronous writes.

Common Flags

Let's use a simple C program to open a file using fopen(3C) to see what flags it uses.

    /* fs.c */
    #include <stdio.h>
    int main(int argc, char **argv) {
            FILE *file = fopen("/tmp/fs-c.tmp", "w");
    }

Then run it with the same command as above to see what flags the file was opened with.

$ gcc fs.c -o fs
$ sudo dtrace -qn 'syscall::open*:entry /pid == $target && copyinstr(arg0) == "/tmp/fs-c.tmp"/ { printf("%s: %d", probefunc, arg1); }' -c './fs'
open: 769
$ fileflags 769
  769: O_WRONLY|O_CREAT|O_TRUNC

Sure enough, the same flags as opening a file for writing in Node land.

Results

fs.writeFileSync is synchronous in the sense that it blocks the event loop while it executes. It does NOT ask the Kernel to do a synchronous write to the underlying file system.

This script will block the event loop while the data is written to the file (or so we think)...

None of the functions above open files for synchronous IO. Because of this, all we know is that the call to write(2) returns, not that the data has been written to the file system and flushed to stable storage. Don't get tripped up on the names, fs.writeSync doesn't synchronously write to the file system.

If you want to open a file for synchronous IO, you'll have to use the lower level fs functions that Node offers such as fs.open() and fs.fsync().



Command line Hue lights

Posted by Dave Eddy on 15 Mar 2013

A command line interface to phillips hue

I wrote a command line tool to interface with hue lights. Check out the project page for more information

https://github.com/bahamas10/hue-cli

Installation

First, install Node.js, then:

npm install -g hue-cli

...and the executable will be installed globally as hue

Usage

Usage: hue [-H host] [--json] [command]

control phillips hue over the command line

examples
  hue config          # view the hue config
  hue lights          # get a list of lights
  hue lights 5        # get information about light 5
  hue lights 5,6,7 on # turn lights 5 6 and 7 on
  hue lights on       # turn all lights on
  hue lights 1 ff0000 # turn light 1 red
  hue lights 1 red    # same as above
  hue help            # this message
  hue register        # register this app to hue, done automatically
  hue search          # search for hue base stations

commands
  config, lights, help, register, search

options
  -h, --help     print this message and exit
  -H, --host     the hostname or ip of the bastion to control
  -i, --init     initialize the config file at /Users/dave/.hue.json
  -j, --json     force output to be in json
  -u, --updates  check for available updates
  -v, --version  print the version number and exit

Unfreeze Photoshop

Posted by Dave Eddy on 27 Feb 2013

Unfreeze a frozen Photoshop on Mac

I wrote an app to unfreeze a stuck or frozen photoshop. If you're running photoshop, and it starts beachballing, or not responding on you, try running this. This application will help if suddenly you have a frozen photoshop, and you want to fix it.

Download the app

screenshot

Read more about it here or check out this blog post


High Performance Node.js Logging with console.log buffering

Posted by Dave Eddy on 06 Dec 2012

Buffer calls to console.log, console.warn, etc. for high performance logging

Description

Calls to console.log, console.error, etc. are synchronous, and as such, will block the event loop while the data is being written to a file, terminal, socket, pipe, etc.

This module provides a seamless, drop-in buffer for all calls to these functions, and flushes them when the buffers exceed a certain size (8k by default).

See Known Issues for timing concerns with this module.

View the project on GitHub

Example

require('log-buffer');
console.log('Hello'); // buffered
console.log('world'); // buffered
// flushed at exit or 8k of data

Even though there are 2 calls to console.log, this example only writes to a file descriptor once.

Customization

You can specify an alternative buffer size to use for automatic flushing like this:

require('log-buffer')(4096); // buffer will flush at 4k

This module also exposes the flush function used to flush all buffers, so if you would like you can manually invoke a flush.

var logbuffer = require('log-buffer');
console.log('hello'); // buffered
console.log('world'); // buffered
logbuffer.flush(); // flushed

Also, you can specify an interval to automatically flush all buffers so logs don't get held in memory indefinitely.

var logbuffer = require('log-buffer');
setInterval(function() {
  logbuffer.flush();
}, 5000); // flush every 5 seconds

This will flush automatically at 8k of data as well as every 5 seconds.

Benchmark

Speed

Tested on a Joyent smartmachine in the Joyent Public Cloud (joyent_20120912T055050Z)

Counting to a million, logging each iteration, piping to dd, without buffering

$ time node benchmark/count.js | dd > /dev/null
0+982421 records in
13454+1 records out
6888890 bytes (6.9 MB) copied, 19.0066 s, 362 kB/s

real    0m19.111s
user    0m16.409s
sys     0m6.546s

Counting to a million, logging each iteration, piping to dd, with buffering (8k)

$ time node benchmark/bcount.js | dd > /dev/null
13446+841 records in
13454+1 records out
6888890 bytes (6.9 MB) copied, 3.46552 s, 2.0 MB/s

real    0m3.495s
user    0m3.390s
sys     0m0.136s

A 5.5x increase in speed with log buffering

syscalls

Using DTrace(1M) we can see how many times the system was asked to write

In the examples below, the output is redirected to /dev/null so we don't get a line printed for each iteration of the loop. DTrace is then told to output to stderr so its data doesn't get sent to /dev/null as well.

Counting to a million, logging each iteration to /dev/null, without buffering

$ dtrace -n 'syscall::write*:entry /pid == $target/ { @ = count(); }' -c 'node count.js' -o /dev/stderr > /dev/null
dtrace: description 'syscall::write*:entry ' matched 2 probes
dtrace: pid 33117 has exited

          1000000

Counting to a million, logging each iteration to /dev/null, with buffering (8k)

$ dtrace -n 'syscall::write*:entry /pid == $target/ { @ = count(); }' -c 'node bcount.js' -o /dev/stderr > /dev/null
dtrace: description 'syscall::write*:entry ' matched 2 probes
dtrace: pid 31513 has exited

              841

1,000,000 write(2) syscalls are fired without buffering, whereas only 841 are fired when the output is buffered.

A 1,189x decrease in the number of syscalls; 1 buffered syscall for every 1,189 unbuffered syscalls.

Install

npm install log-buffer

Tests

npm test

Known Issues

  • All buffers are flushed when flush is called (whether automatically or manually). Because of this, calls to different console family functions may return out of order.

Example:

require('log-buffer');
console.log(1);
console.error(2);
console.log(3);

yields

1
3
2

1 and 3 are both written to stdout and 2 is written stderr. The priority order in flushing is ['warn', 'log', 'error', 'info']


pcurl - Node Module

Posted by Dave Eddy on 26 Oct 2012

Concurrently curl a list of hosts and print the results

Pcurl

View the project page on GitHub

Usage

Used as a command line tool

pcurl url1 url2 url3 ...

or pipe through stdin

pcurl < newline_sep_list_of_urls.txt

Examples

Use bash expansion to hit multiple URLs at the same time, and report the results

$ pcurl http://ifconfig.me/{ip,host,ua,port,lang,connection}
URL                            CODE  BODY
http://ifconfig.me/connection  200   keep-alive
http://ifconfig.me/host        200   host.isp.net
http://ifconfig.me/ip          200   2.6.9.8
http://ifconfig.me/lang        404
http://ifconfig.me/port        200   41699
http://ifconfig.me/ua          200

Or send the same data through stdin

$ printf "%s\n" http://ifconfig.me/{ip,host,ua,port,lang,connection}
http://ifconfig.me/ip
http://ifconfig.me/host
http://ifconfig.me/ua
http://ifconfig.me/port
http://ifconfig.me/lang
http://ifconfig.me/connection
$ printf "%s\n" http://ifconfig.me/{ip,host,ua,port,lang,connection} | pcurl
URL                            CODE  BODY
http://ifconfig.me/connection  200   keep-alive
http://ifconfig.me/host        200   host.isp.net
http://ifconfig.me/ip          200   2.6.9.8
http://ifconfig.me/lang        404
http://ifconfig.me/port        200   41699
http://ifconfig.me/ua          200

Limitations

  • Only supports GET requests
  • Meant to receive small body data
  • Hardcoded 10 second timeout

Install

npm install -g pcurl

License

MIT


perms - Node Module

Posted by Dave Eddy on 25 Sep 2012

Convert Unix style permissions to strings like ls (0755 => 'rwxr-xr-x')

I wrote this module to make it easy to print out nice looking permissions in my scripts before calling fs.chmod for the user.

Examples

Given

var perms = require('perms');

Convert a mode to a human-readable string like ls(1) generates

var p = perms.toString('0755');
console.log(p);

yields

rwxr-xr-x

Also handles special permissions

var p = perms.toString('6660');
console.log(p);

yields

rwSrwS---

Go backwards as well!

var mode = perms.toMode('rwxr-xr-t');
console.log(mode);

yields

1755

NOTE: This module makes it super tempting to parse ls(1)... don't do that!

http://mywiki.wooledge.org/ParsingLs

Usage

var perms = require('perms');

Functions

toMode(s)

Given a string, return the mode suitable for passing to fs.chmod

toString(s)

Given a mode, return a string suitable for printing to a user

Installation

npm install perms

Tests

npm test

Compiling watch(1) on SmartOS

Posted by Dave Eddy on 18 Sep 2012

watch - execute a program periodically, showing output fullscreen

Watch

Coming from Linux, I have become used to the watch tool. The program is very simple, it lets you run a command at a given interval, and update the terminal with the output. It's like a simple while loop that runs a command, but it uses ncurses to update the view, which makes it easy to spot differences.

Compiling it on SmartOS was straight forward, the hardest part was finding the source of watch(1). It turns out it is bundled with a handful of tools for dealing with the Linux proc filesystem. Because of this, most of the tools wont compile (or be useful) on an Illumos based operating system, so the Makefile proved to not be helpful.

To install watch(1) on SmartOS, you first have to grab the source and extract it

wget http://procps.sourceforge.net/procps-3.2.8.tar.gz
tar xf procps-3.2.8.tar.gz
cd procps-3.2.8

Then, you need make sure you have the proper dependencies

pkgin in ncurses

Finally, compile it and install it manually

gcc -I/opt/local/include/ncurses -L/opt/local/lib -lncurses -m64 watch.c -o watch
cp -f ./watch /opt/local/bin
cp -f ./watch.1 /opt/local/share/man/man1/

That will ensure the program is installed along with its manpage

dave @ [ dave-01 :: (SunOS) ] ~ $ watch --version
0.2.0

Trace streams in Node.js

Posted by Dave Eddy on 29 Aug 2012

I was working on my latest module for node, latest, when I encountered a problem. While testing, I noticed output being written to the terminal that I wasn't generating. This means that some module had the audacity to write directly to the terminal... bad.

My project was only including the npm module, however the npm module included its own handful of other modules, and they included their own modules, and so-on. If I were to look through the source myself to try and find what was outputting to the terminal it would have taken forever, trying to trace what function was calling what. Instead, I decided to override the stdout streams write function, to stack trace when invoked. This means that any function that writes to stdout will cause a stack trace to be printed to the screen.

Example

Imagine a script that looks like this

require('./1');
require('./2');
require('./3');
console.log('done');

very simple. It includes 3 modules, and then prints "done". Let's look at the output it generates.

undefined
done

What? why does it print "undefined" before it prints "done"? One of the included modules is taking it upon itself to write directly to the terminal. Without knowing where to look I might make the mistake of trying to hunt down the function myself, tracing code paths in my head, trying to make sense of someone else's code. Instead, let's cut it off at the source, and force every write to dump a stack trace.

All I have to do is add a bit of code to the top of the example file

var a = process.stdout.write;
process.stdout.write = function() {
  console.trace();
  a.apply(this, arguments);
};

require('./1');
require('./2');
require('./3');
console.log('done');

It might look confusing what is happening here, but it's actually very simple. I first declare a variable a, and set it to what process.stdout.write is set to (a function). Then, I redefine process.stdout.write to be a function that first dumps a trace, and then calls the real/original write function. It might look recursive, but this is possible in javascript thanks to closures.

Now, when I run this I see

Trace
    at WriteStream.process.stdout.write (/Users/dave/temp/streamsnoop/main.js:3:13)
    at Object.exports.log (console.js:25:18)
    at Object.<anonymous> (/Users/dave/temp/streamsnoop/2.js:1:71)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:362:17)
    at require (module.js:378:17)
    at Object.<anonymous> (/Users/dave/temp/streamsnoop/main.js:8:1)
undefined
Trace
    at WriteStream.process.stdout.write (/Users/dave/temp/streamsnoop/main.js:3:13)
    at Object.exports.log (console.js:25:18)
    at Object.<anonymous> (/Users/dave/temp/streamsnoop/main.js:11:9)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.runMain (module.js:492:10)
    at process.startup.processNextTick.process._tickCallback (node.js:244:9)
done

There are 2 traces, 1 from "undefined" being printed to the screen, and the other where I print "done". The first trace is the most telling, if we follow it backwards we will find the culprit. The first line of the trace says main.js:3:13, which is telling us the line of code where the trace was triggered. The next line shows console.js, because we called trace manually. Moving onto the third line we see 2.js:1:71, which tells us exactly where the problem is.

The module 2.js is writing to stdout on line 1. If we look into this file we see

console.log(undefined);

and sure enough, that's the problem. Overriding the process.stdout.write function has made it possible to find this issue without having to dig through (potentially) a lot of files manually.

You can read the abridged version in this github gist


exec - Node Module

Posted by Dave Eddy on 26 Aug 2012

Call a child process with the ease of exec and safety of spawn

View the Project page on GitHub

Why?

This module provides the best of both worlds of spawn and exec

It will simply return 2 strings containing stdout and stderr (like child_process.exec), but will take an array of process arguments (like child_process.spawn) to avoid any nasty shell expansion.

Usage

var exec = require('exec');

Example

var exec = require('exec');

exec(['ls', '-lha'], function(err, out, code) {
  if (err) throw err;
  process.stdout.write(out);
  process.exit(code);
});

The example above will call ls -lha safely, by passing the arguments directly to exec(2) without using an shell expansion/word splitting.

It returns a child_process.spawn object, and callbacks with any stdout, stderr, and the exit status of the command. The above example will throw an error if any stderr was produced, otherwise it will print the stdout and exit with the exit code of ls.

Functions

exec(['args'], [opts], callback)

opts is additional options to pass to child_process.spawn

Installation

npm install exec

License

MIT


Newer Posts 1 of 11 Older Posts »