background

Recently I’ve been thinking a lot about the tasks that new programmers are typically set when they’re starting to learn to write code for the first time.

After I thought about it for a while I realised there are actually quite a lot of projects that are set, perhaps with the intention of setting the learner on a path to do things like looking up algorithms and things like that, which are never actually completed in a satisfactory way.

The biggest stumbling blocks in these starting projects are usually those that involve working with dates and time zones, or using things like the modulo operator to decide when actions should happen in loops.

But perhaps the worst project that most learners are given is to build a calculator.

Usually the learner will set out to do something like putting two input fields on a page in which the user is expected to enter two numbers, and then a drop down is provided to choose the operation they wish to perform, usually +, -, * and /.

While this certainly helps the learner get into the basics of programming in whichever environment they are working in, it doesn’t really teach them all that much. In JavaScript the most you can learn from a basic two input field calculator is how to write a few HTML tags and how to write addEventListener.

So with this post I want to explain how to build a real, fully functional calculator in JavaScript that can handle any operation you wish to add to it.

I don’t expect a complete beginner to understand this post, but for those who have been programming for at least a few months I hope it serves as a simple introduction on how useful existing algorithms are and how to build a functioning calculator.

Just make sure to actually read and try to understand the code instead of blindly copy and pasting it!

For this project, I’ll be focusing on proper order of operation (PEMDAS) using the (, ), +, -, *, /, % and ^ operators. The ^ operator is of course not a part of JavaScript as a mathematical operator, so in our code we will actually be using Math.pow(lhs, rhs), but we will let the user type ^ in the input field.

Once the calculator has been built, it should be trivial to add more functionality, for example, you could add in things like the trigonometric cos, tan and sin functions along with their inverse ‘arc’ counterparts, atan, acos and asin.

So how do we start building a calculator?

Considering that we want to evaluate any input the user types into our program the first thing we need is an HTML file that we can use as a base for our calculator.

index.html:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <title>Calculator</title>
  </head>
  <body>
    <input type="text" id="expression">
    <button id="evaluate">Evaluate</button>
    <span id="answer"></span>

    <script src="calc.js"></script>
  </body>
</html>

You’ll notice that we have an <input> field that the user can type their expression into, a <button> that the user can click on to calculate the answer, and a <span> that we can use to display the answer after the user has clicked on the button.

In addition we have a <script> tag at the bottom of the page which is where we will write the rest of the program.

Now that all of the boilerplate is set up, we can get on with actually writing some JavaScript.

The first thing we need to be able to do is take the string that the user types into the expression input and break it down into tokens.

Because computers can’t see ahead like we can, we need to do this one character at a time and decide when we have scanned enough that we can call the input so far a token.

We could put the burden on the user to make sure they put spaces in between the tokens themselves and just split the string on the space character with JavaScript’s .split(" ") function, but that isn’t very user friendly, so we should really write a small lexer instead.

A lexer is a part of a program that performs lexical analysis; it’s also known as a scanner, or a tokenizer, because through lexical analysis it breaks down input into tokens.

The way our lexer will work is that we will advance through the input characters one at a time and when we reach a certain character we will tell it to build a token out of any characters after it that meet a certain criteria.

In calc.js type the following:

calc.js:

document.getElementById("evaluate").addEventListener("click", function(e) {
  const expr = document.getElementById("expression").value;
  const tokens = lexer.tokenize(expr);

  document.getElementById("answer").innerHTML = "<pre>"
                                              + JSON.stringify(tokens, null, 2)
                                              + "</pre>";
});

const lexer = (function() {
  function tokenize(input) {
    const chars = input.split("");
    const tokens = [];

    while (chars.length) {
      // Skip white space
      readWhile(chars, isWhitespace);

      if (!chars.length) {
        break;
      }

      const ch = chars.shift();

      if (isNum(ch)) {
        tokens.push({ type: "NUM", val: ch + readWhile(chars, isNum) });
      } else if (isOp(ch)) {
        tokens.push({ type: "OP", val: ch });
      }
    }

    return tokens;
  }

  function readWhile(chars, predicate) {
    let str = "";

    while (chars.length && predicate(chars[0])) {
      str += chars.shift();
    }

    return str;
  }

  function isWhitespace(ch) {
    return /[\n\t\s]/.test(ch);
  }

  function isNum(ch) {
    return /[0-9.]/.test(ch);
  }

  function isOp(ch) {
    return /[()\-+\/*^%]/.test(ch);
  }

  return { tokenize };
})();

In the above code we add an event listener to the button and use the lexer object that we just built to tokenize the input. tokenize is the only function available on the lexer object because it’s the only function we are returning at the end of the IIFE.

What this code does is fairly simple, but can be a bit confusing to look at to begin with. Because of that, try carefully reading through the tokenize function step by step to see what it does.

Simply, the tokenize function is just given a string of user input as the argument to the input parameter. Once it has the input it then uses the input.split("") function to split the string into an array of characters that we can work on.

We also keep a tokens variable which will store all of our token objects in order and will be returned at the end of the function.

Then comes the while loop. This loop will execute as long as there are still characters left in our chars array to process. On each iteration of the loop we use the readWhile helper function to skip over any white space. It does this by taking the chars array with a function to test against; in this case we are testing against the isWhitespace helper function.

As long as the helper function is returning true the readWhile function will build a string by shifting characters off the bottom of the array and finally returning the complete string. In this case we don’t do anything with the string because it’s just white space, so we ignore the return; the function is just being called here to skip over white space and get to the next real character we care about.

After skipping the white space we check to see if there are any characters left in the array, and if not we break out of the loop.

Next we take the next character off the bottom of the array and test it with the isNum and isOp functions. If either of them return true we build a token object and push it onto our tokens array.

If you copy all of this code into your own HTML and JS files you can try typing an expression into the input field and press the ‘Evaluate’ button.

For example, try typing 1.33+1/2 % 89.3 (3 ^ 2)

You should see a JSON string of the tokens produced by the lexer.

You will also notice that we can type an entire expression without spaces and the lexer will still correctly tokenize our input for us. Currently it will also ignore any letters of the alphabet and pretend they aren’t there.

We aren’t performing any error checking in this post, so you can create invalid number like a decimal number with two points, but you could add that in as an exercise if you want to.

Now that we have a lexer which we can use to tokenize the expression we need to talk about how to actually do the calculations.

As mentioned before computers can’t look ahead in the expression like we can, so the order we write expressions in isn’t very useful to a computer.

The way we’re used to writing expressions is called ‘Infix Notation’. This is because the operators are ‘inside’, or ‘in-between’ the operands, like so: 1 + 2 (the + is in-between the 1 and 2).

But a computer can’t easily figure out the order of operation from Infix Notation, so we need to use what is called ‘Postfix Notation’ instead; this is also known as ‘Reverse Polish Notation’ or ‘RPN’ because it was invented by a Polish logician by the name of Jan Łukasiewicz.

Reverse Polish Notation is difficult for us to figure out because we aren’t used to using it, but for a computer it’s very convenient because it automatically provides the correct order of operations, provided we do the calculation with a combination of a queue and a stack.

In case you don’t already know, a queue and a stack are what are called data structures.

A queue is a First-in First-out, or FIFO data structure because you can push stuff onto the top of it and shift stuff off the bottom; this can also be called Last-in Last-out, or LILO.

A stack is a Last-in First-out, or LIFO data structure because you can push stuff onto the top of it and then pop stuff off the top again; this can also be called First-in Last-out, or FILO.

JavaScript doesn’t have these data structures built in, so we need to use arrays instead and just be disciplined about how we use them.

For example, if we had the the following Infix Notation expression 1 + 2 * 4 - 5, we as humans can tell that the * needs to be done first, but a computer moving forward through the expression one token at a time doesn’t know this, so it would see the + first instead.

In Reverse Polish Notation the same expression looks like this: 1 2 4 * + 5 -.

The way you evaluate this is to apply the first operator in the sequence of characters to the two numbers before it, collapse the answer into the expression and carry on.

For example, 1 2 4 * + 5 - is evaluated as follows:

In 1 2 4 * + 5 -, 2 4 * is the first operation; 2 4 * equals 8 so we collapse the answer 8 back into the expression at the same location leaving us with 1 8 + 5 -

In 1 8 + 5 -, 1 8 + is the next operation; 1 8 + equals 9 so we collapse the answer 9 back into the expression at the same location leaving us with 9 5 -

Now the only operation left is 9 5 - which leaves us with the answer 4.

The way we convert from Infix to Reverse Polish can be summed up with the following algorithm using a queue and a stack:

  • Look at the next (or first) token
  • If you have run out of tokens then you should empty the stack onto the queue
  • If the stack is not empty, is the next token a )?
    • If yes empty the stack onto the queue up until you reach a ( then discard both parentheses
    • If no, is it a number?
      • If yes push it onto the queue and start again with the next token
      • If no it must be an operator, so check if we can push onto the stack with the following steps
        • Is the stack empty or is the operator a (?
        • If yes, push it on the stack and start again with the next token
        • If no, is the current operator of a higher precedence than the operator on top of the stack?
          • If yes, push it on the stack and start again with the next token
          • If no, pop the stack and push the popped operator onto the queue then check again until you can put the latest operator on the stack

This is called the ‘Shunting-Yard Algorithm’ invented by a Dutch computer scientist by the name of Edsger Dijkstra. You can read about the algorithm in more detail here: Shunting-Yard Algorithm.

We will be using the Shunting-Yard Algorithm to re-order the tokens into Reverse Polish Notation which we will then finally use to actually evaluate the expression.

Let’s add this new conversion function to our lexer under the isOp function.

function infixToReversePolish(tokens) {
  const queue = [];
  const stack = [];
  const precedence = {
    "(": 10,
    "+": 20, "-": 20,
    "/": 30, "*": 30, "%": 30,
    "^": 40,
  };

  while (tokens.length) {
    const token = tokens.shift();
    const tkPrec = precedence[token.val] || 0;
    let stPrec = stack.length ? precedence[stack[stack.length - 1].val] : 0;

    if (token.type == "OP" && token.val == ")") {
      let op = null;

      while ((op = stack.pop()).val != "(") {
        queue.push(op);
      }
    } else if (token.type == "NUM") {
      queue.push(token);
    } else if (token.type == "OP" && (!stack.length || token.val == "(" || tkPrec > stPrec)) {
      stack.push(token);
    } else {
      while (tkPrec <= stPrec) {
        queue.push(stack.pop());
        stPrec = stack.length ? precedence[stack[stack.length - 1].val] : 0;
      }

      stack.push(token);
    }
  }

  while (stack.length) {
    queue.push(stack.pop());
  }

  return queue;
}

If you read through this code carefully you will see it just follows the steps outlined in the Shunting-Yard Algorithm explained above.

Now we just need to use the conversion function after the tokenization has been completed. The easiest way to do this is to just wrap it around the return of the tokenize function like this:

return infixToReversePolish(tokens);

If you followed along then you should now see that the JSON output when you evaluate an expression is in Reverse Polish Notation order. For example, if you try typing in the expression 1 + 2 * 4 - 5 you should see that the order of the tokens is 1 2 4 * + 5 -.

This means we are finally at the part of the program where we can evaluate the full expression and output a final answer!

In order to calculate the final answer we need to treat our array of re-ordered tokens as a queue and push them onto yet another stack. We shift tokens off the bottom of the queue and push them onto the stack until we reach an operator.

At that point we don’t push the operator onto the stack; instead we pop the last two numbers off the stack and do the calculation with the operator. The first number you pop off the stack will be the right hand side (RHS) and the second number you pop will be the left hand side (LHS). You then do the calculation with the LHS and RHS in the correct order around the operator and push the result back onto the stack.

After that you carry on shifting tokens off of the queue and pushing them on the stack until you reach another operator and do the same thing again. By the end you should be left with one number token left on the stack; this token will be your final answer.

So let’s create a new evaluate function. This function doesn’t need to be inside the lexer, it can just be a stand-alone, plain old function places anywhere in the file.

Make sure to read the comments in the code.

function evaluate(tokens) {
  const stack = [];

  while (tokens.length) {
    const token = tokens.shift();

    if (token.type == "NUM") {
      // We only care about the numbers now, not the whole token
      // so we only push the val and make sure it's a number with parseFloat()
      // We need to use parseFloat otherwise we will have trouble with
      // JavaScript's type coercion when we try to do things like (lhs + rhs)
      stack.push(parseFloat(token.val));
      continue;
    }

    // If we reach here we must be working with an operator token
    const rhs = stack.pop();
    const lhs = stack.pop();

    switch (token.val) {
      case "+": stack.push(lhs + rhs); break;
      case "-": stack.push(lhs - rhs); break;
      case "*": stack.push(lhs * rhs); break;
      case "/": stack.push(lhs / rhs); break;
      case "%": stack.push(lhs % rhs); break;
      case "^": stack.push(Math.pow(lhs, rhs)); break;
    }
  }

  // Finally we pop our answer off the stack
  return stack.pop();
}

Now that we have an evaluate function let’s update the event listener to use it and report the answer to the user.

Change the event listener code to the following:

document.getElementById("evaluate").addEventListener("click", function(e) {
  const expr = document.getElementById("expression").value;
  const tokens = lexer.tokenize(expr);
  const answer = evaluate(tokens);

  document.getElementById("answer").innerHTML = answer;
});

If you did everything correctly you should now have a fully functional calculator written in JavaScript.

There is of course a lot you could add to this. Checking for errors like unmatched parentheses or invalid numbers like decimals with two decimal points would be a good start, but for the purposes of this post the project is complete.