title: PowerShell5-Tokenize_Expression-First_Failing_Test — Up Next–>
First Failing Test
Now it’s time to create the first test. We’ll start with a failing test, and do something simple to get it to work.
- Create a new file called Tokenizer.Tests.ps1:
- Run your (now failing tests):
This failed, and it seems it failed for a reasonable reason. It doesn’t know about the type Tokenizer. To remedy this, we’ll create a module and import it into the test.
- Create a new filed called Tokenizer.psm1:
- At the top of Tokenizer.Tests.ps1, add the following line:
- Run your tests and see that the error has changed a bit:
Closer, let’s get to a passing test. Again, we’ll do just enough to get the test passing.
- Update Tokenizer.psm1:
- Run your test (expecting things to pass):
This might be unexpected. If so, that’s good because when unexpected thigns happen, we’re about to learn something.
In this case, running Pester directly as we are updates the current shell. There are several solutions, but a trivial one is to run “powershell invoke-pester”:
Summary
There are many common complaints about TDD in such a simple start:
- This doesn’t do anything
- How can such small steps accomplish anything?
- I know so much more about what I need to do, why don’t I jump ahead and save time?
- … Insert another 20 complaints here.
I’m not going to even try to convince you that this does or does not work. We’ll work through the problem taking an extremely incremental approach. We’ll build up a solid footing so we can experiment later. Even so, this trivial example has demonstrated several (possibly incorrect) decisions:
- We have a class that does this work called Tokenizer
- We have a single method that we call, called tokenize
- It takes a String and returns an array of Strings, one element for each token
Now that we have an API, we can focus on trying to grow the algorithm to make it work with at least the examples listed above.
Initialize And Initial Push
- Make your shunting_yard_algorithm directory a git repo
- Now add all the things:
- Verify only the things we want to add have been added:
- Make your first commit into your local git repo:
- And look at the results:
- You can verify that there are no local changes remaining:
title: PowerShell5-Tokenize_Expression-Simple_Binary_Expressions — PreviousUpNext
Simple Binary Expression
Now that we have a trivial first test, we’ll begin growing the implementation one test at a time. We’ll be following Uncle Bob’s Three Rules of TDD, summarized here as:
- Write no production code without a failing test
- Write just enough of a test to get the test to fail
- Write just enough production code to get the test to pass
We will additionally do the following:
- Keep existing tests passing while making the new ones pass
- Keep the code clean by refactoring it every so often
- Frequently committing code using git
In general, there are for things we might do at any point:
- Write test code
- Write production code
- Refactor test code
- Refactor production code
We’ll strive to do only one of these at a time and only switch to another one of these actions when all tests are passing.
Moving Towards Binary Expression
Rather than immediately going to a full binary expression, we’ll add a test with a number and an operator.
- Create a new test:
- Run your tests, you should see an error similar to:
- A little bit of regex magic allows us to get the tests passing:
-
Run your tests, they are passing. When I wrote that code, I was tempted to add a check to verify something I suspected. Rather than do that, with all of the tests passing, I suggest going back to the first test and making a change. There’s near duplication between the two tests. Make them more similar.
-
Update the first test:
- Run the tests, you’ll notice now that the first test fails:
- Suspicion confirmed, update the code:
-
Run your tests, and it’s back to passing. Now we might be ready for a complete binary expression. Let’s give that a try, then we’ll do some refactoring of the tests.
-
Add a new test:
- Sure enough, running the tests shows that we’re not quite done with a binary expression:
- We need to do this more than once, and check for digits and not digits. This will do it:
-
Run your tests, they should pass.
-
A quick check after your tests are passing will verify that the final if is not necessary:
-
Run your tests, they should be passing.
-
Now is a great time to commit your changes because we are going to refactor and if things go badly, this is a good place to be able to easily get back to.
-
Noticing some duplication and after a two successful tries, I ended up with the following:
I’m not a PowerShell expert and I do not know how common/popular/idiomatic the use of [ref] is, but it nicely collapses the code. I even notice something that will come up later (I see a pattern I’ve not noticed before). So I’ll chose some tests to exploit that. But before doing that, thre’s a few more things to do with our tests in terms of refactoring and test cases.
The test file has a bit of duplication. It’s time to collapse that. To do so, we’ll use the -TestCases feature of Pester.
- Update your test by adding a new test, which duplicates the first test:
- Run your tests, and they all pass:
- This new tests duplicates the first test, so it is safe to remove it. While you are at it, convert the other two tests into this last one:
Now back to checking/extending the behavior. Let’s make sure our code handles white space, and more than one operator, multi-character operators, and even variables.
- Add a new test case:
-
Run the tests, this seems to work fine. The while loop covers an expression as long as we need.
-
Now let’s make our code handle variables. Add another test case:
I was initially surprised this worked, but that’s the issue with regular expressions. They are often more flexible than you at first realize. The letters are not digits, which is how we are checking for digits versus operators. Knowing this, I’m going to change that test case to a multi-letter variable because I want to start with a broken test.
- This fails as I expected:
- Here’s a quick fix to make this work:
Note that the regular expression was simply \d+ and now it is [\d\w]+. This might seem too clever, too simple or maybe it seems like I’m cheating. In fact, if that’s the case, then to “prove” I’m cheating, you want to find a test that will cause my code to break. However, I’m fine with that solution for now. If this were a real problem, I think I’d have a known list of operators and check for them explicitly. However, this is a simple example and so I’m OK with simple tests and simple solutions.
- What about a multi-character operator? Add a new test:
- Run your tests, and this fails.
- Update the regular expression to fix this:
- When I run the tests, I find my confidence was too high:
- The second regular expression was originally the opposite of the first, but we added \w to the first, so we need to do the same for the second one:
That’s a good lesson. I though it was simple, and it was, just not in the way I though. My tests allowed me to experiment, learn, adjust and make progress.
- Now is probably a good time to commit your changes.
Now let’s handle white space. We can match white space much like we do everything else, or we could simply remove it. Regardless, a test will keep us obvious. I think this is going to be simple, so I’ll start with a “big” test:
- Once again, my confidence has bitten me. I figured I could simply replace all of the white space:
- Close, but no cigar:
- Here’s a second attempt:
- There are three changes. First, the regular expression is in the while loop. Second, it only matches at the beginning of the string, third, the bottom regular expression is also excluding \s (white space characters). However, those changes results in passing tests:
This seems like enough progress on binary expressions. Next up, handling parenthesis. PreviousUpNext
title: PowerShell5-Tokenize_Expression-First_Stab_At_Parentheses — PreviousUpNext
First Stab at parenthesis
There are two ways in which our tokenizer might encounter parenthesis. The first is to group a lower-precedence operator, as in:
- (3 + 4) * 6
The next is in the use of function calls.:
- f(4)
For now, we want to pull out the () as tokens. We’ll allow something at a higher level to determine the context.
- Create a first test to see what happens when we use (:
- Interesting, this works. I’m skeptical that we’ve got it working as needed. Knowing a little bit about the code, I’ll add another test:
- There’s the failure I’m expecting:
- Now it’s time to update the code just a touch. Rather than changing how to handle operators, I’ll add another match:
- Run your tests, that seems to work.
- Now is a good time to commit your code. After this, it seemed like there was a pattern in the code that I could represent with an array and a loop. Here’s another version that also works. I’m not sure if I like this better or not.
title: PowerShell5-Tokenize_Expression-Function_Calls — PreviousUpNext
Function Calls
Finally, function calls. This might already work. Let’s write an experiment and see what the results are:
This passes. The question I’m wondering is whether we want those results, or something more like this:
As it turns out, the first version passes as is. The second version requires a change to one of the regular expressions:
The change is in the middle regex, allowing for zero or 1 ( at the end of a series of digits and letters. Given the change is easy, I’ll leave this as is and consider the tokenizer done for now. Next, we’ll move on to the Shunting Yard Algorithm in PowerShell 5.
title: PowerShell5-Tokenize_Expression-Convert_Tokenizer_To_An_Enumerator — PreviousUpNext
The Tokenizer converts a whole expression into an array of tokens. Now we’ll convert it to an Enumerator.
Convert Tokenizer to Enumerator
We are going to convert this in place while maintaining the tests.
Add Required Interfaces
- Add the interfaces to the class:
- Run your tests. They fail due to missing required methods.
- Add each of the following methods stubbed out to get our existing tests running again:
- Run your tests, they now should be back to passing. Next, we’ll add a new test that uses the Tokenizer as an iterator and get it passing.
- Add only the first test to keep this as simple as possible:
- Now write just enough of the interface method to get this test passing:
There are a few things to note in this first version:
- We used a constructor in the new test that takes in the expression and stores it. Adding a constructor taking a single argument will make PowerShell remove the default no-argument constructor. To keep the tests passing, we add in an empty no-argument constructor as well as a one-agument constructor. We’re migrating this code so this is an intermediate form. When we’ve finished converting this from its original form to an enumerator, it will no longer need the no-argument constructor.
- The property get_Current needs something to return. That’s what $this.currentExpression is. It’s assigned in the one-argument constructor. That’s fine for now. As we add more tests, this will change.
- Run your tests, they should pass.
- Now, we copy the second test case and work on getting it to pass as well:
- Run your tests, they fail:
- Here are a few changes to make that work. Notice that some of this code is copied from the interpret method.
- Run your tests, they should all pass.
- Add the next test:
- Run your tests, they all pass.
- Add all of the remaining tests:
- The only test failing deals with spaces in the expression:
- Add the missing line into MoveNext (right before the foreach):
- Run your tests, and all tests pass.
- Now we can remove the first test and the original methods:
- Also, remove the old code from the Tokenizer:
Notice that we have no tests for Reset? It is required to get the code to run but we don’t use it in a test. Time to add a missing test and write its implementation.
- Add one final test:
- Run the test, it fails:
- Update Tokenizer to store the original expression in the constructor and implement the reset method.
- Run your tests, they all pass.
title: PowerShell5-Tokenize_Expression-Finalish_Version — PreviousUp
Final-ish Version
After the adding support for functions, I figured that was enough. However, once I looked at the final version, I found one thing worth cleaning up. The regular expressions are a bit confusing, but I’m OK with them. However, giving them names would clean up the code a touch for the next person (probably me) that had to support it. Here’s a quick refactoring of that:
After that, I decided to migrate the example to an Enumerator. This is that version.
Here are the most-recent final versions of these two files.
Comments