Node streams vs imperative processing

In notebook:
Work Notes
Created at:
2019-05-04
Updated:
2019-05-04
Tags:

Serving a ZIP archive via streams versus from file system (imperative style)

Recently, I had to create a service where the user could download an HTML file and its related images (stored on another server) as a ZIP archive from the server.

The non-ideal, naïve approach

I would first download all elements to the file system, then create the zip archive then serve it from the file system. This would be consist of impelementing these steps:

  1. get a path where to save your archive (dev vs prod environment)
  2. ensure the path exists
  3. process and save the HTML file
  4. start downloading the related images
  5. when all this is finished, create the zip
  6. serve the zip file
  7. clean up (in fact clean up should run at several points)

Nothing complicated but with NodeJS and streams this can be greatly simplified and optimised.

Streaming the ZIP archive

The archive creation library I was planning to use (archiverjs) was intended to be used with streams (as all good Node libraries should!), which let me rethink my strategy:

  1. send back the relevant HTTP headers to the browser, telling I'm sending a ZIP archive
  2. start piping the HTML file into the response
  3. start downloading the images and also pipe them directly into the response
  4. signal when everything is done

This is way more efficient, the server only has to deal with individual chunks, a slow connection will not increase the memory usage and no need to store locally the files. The service runs on a serverless platform (Zeit Now), it does have a writeable /tmp directory, but it's cleaner if I don't have to use it.

The implementation

  // leaving out some task management parts...
const { task, of } = require('folktale/concurrency/task')
const cheerio = require('cheerio')
const request = require('request')
const archiver = require('archiver')


res.setHeader('cache-control', 'max-age=0')
res.setHeader('content-type', 'application/zip')
res.setHeader('content-disposition', 'attachment; filename="archive.zip"')

function prepareAchiveStream(data) {
  return task(function _prepareArchive(resolver) {
    const archive = archiver('zip', {
      zlib: { level: 9 }
    })
    resolver.resolve(archive) // we resolve the archive, can start piping
    archive.append(localImages(data.html), { name: 'index.html' })
    addImages(archive, data.html)
    archive.finalize()
  })
}

function addImages(archive, html) {
  let $ = cheerio.load(html)
  $('img').each(function _processImage(i, img) {
    archive.append(request(img.attribs.src), {
      name: `images/${getFileName(img.attribs.src)}`
    })
  })
}

// then pipe the result
workflow(req)
.run()
.future()
.map(r => {
  r.pipe(res)
})