learnyounode Lesson 9 – Juggling Async

Up until now we have managed to skate by one of the more challenging aspects of node, sorting through the aftermath of an async operation. In synchronous operations, all data is returned in the order which it was retrieved. In node, all data is returned in the order which retrieval operations completed. Say you are fetching the data from five files and assigning that data to an array. The first file is 5 mb, and the last is 20 kb. The rest are somewhere in between. The first value in the array will end up being the data from the smallest file, rather than the first file unless you take steps to ensure that the data is stored in the correct order.

This is actually the problem that drove me to the learnyounode tutorials. I was working on a pdf to text application. My program converts each page of a pdf in to png images, then uses optical character recognition to convert the image to text. After that, the text is appended to a text file. Unfortunately, pages in were appended to the file out of order because some images were larger than others, and required more time to process. As of writing this post, I have not bothered to go back to correct my program. This is the first project I tackled with node, so it is all sorts of wrong. I don’t recommend using it as an example unless you need an example of what not to do.

The solution to handling this mess is to count your callbacks. Callbacks are how your code will let the next block of code know that it is done with its job. Imagine you are general contractor coordinating the completion of several homes. Every time the painter gets done he calls you back to let you know. You keep track of this so you can let the trim and finishing guys know that it is there turn to do their job in the right order. Imagine how disappointed you would be if the painter painted over the really nice hardwood trim you had installed. Probably about as disappointed as I was when I discovered that the pages were out of order on my pdfs that had been converted to text.

Official Solution

var http = require('http')
var bl = require('bl')
var results = []
var count = 0

function printResults () {
  for (var i = 0; i < 3; i++)
    console.log(results[i])
}

function httpGet (index) {
  http.get(process.argv[2 + index], function (response) {
    response.pipe(bl(function (err, data) {
      if (err)
        return console.error(err)

      results[index] = data.toString()
      count++

      if (count == 3)
        printResults()
    }))
  })
}

for (var i = 0; i < 3; i++)
  httpGet(i)

The official solution splits this task into two functions that assign to and iterate through the values in the array ‘results’. The function printResults is used for iterating through values in ‘results’ and logging the vales to the console. If you didn’t notice, the for loop is sequential. Results[i] cannot be logged tot he console unless it’s value is greater than the previous iterated value. Since the iterator ‘i’ starts at zero, the first value that can be logged to the console is results[0]. The iterator is only increased in value by one each time the loop runs so the only possible result is of the next pass is the iterator plus one. This is important because now we know that our data needs to be added to the array ‘results’ in the correct order. It is also worth mentioning that using a ‘for in’ statement should be avoided in this scenario because a ‘for in’ loop does not necessarily return results in the order that they are stored in an array.

The second function used in the official solution is httpGet. This function accepts an index as an argument, which is used when the function is called at the end of the program. Inside of httpGet, the http.get() method from the core module is used. The urls that are requested are passed as the first argument by calling value of the process.argv property. Note that the value of the index argument is added to the index of the process.argv array to pass multiple urls inputed to the command line to this function.

The second argument passed to http.get() is a callback function which accepts response as an argument. Inside of the callback function, the response.pip() method is used to pipe the body of the response to the bl() method from the ‘bl’ module introduced in the previous lesson. The bl() method accepts a callback funciton as an argument, which accepts err and data as arguments. Inside of the callback, if an error is encountered it is logged to the console. Additionally, the value of results at the index passed to httpGet (‘results[index’) is assigned the data from the callback. The data is converted to a string with the .toString() method. Each time our callback inside of the bl() method is run, the value of the count variable is increased by 1. If the value of count is equal to 3 for all three urls that are passed as arguments to this program, the printResults function is called.

Finally, at the end of the program, a for loop is used to iterate through values less than 3 in order to assign the appropriate index to variable i, which is passed to httpGet(). The for loop here is used to iterate through each command line argument in the correct order.

So where does the magic happen to ensure that the results are logged to the console in the correct order? It starts with the for loop at the end of the program which iterates through the command line arguments in the correct order, and passes the iterator value to the httpGet function. This ensures that data is assigned to the ‘results’ array in the correct order. Finally, the for loop in the printResults function ensures that the results are logged to the console in the correct order.

That’s it!

Lesson 10

Contents

Joe Creager

Joe Creager

learnyounode Lesson 9 – Juggling Async

Official Solution