PowerShell5-Tokenize_Expression-As_One_Page

title: PowerShell5-Tokenize_Expression-First_Failing_Test — Up Next–>

First Failing Test

Now it’s time to create the first test. We’ll start with a failing test, and do something simple to get it to work.

Create a new file called Tokenizer.Tests.ps1:

    Describe "Tokenizing an in-fix expression" {
    
      It "Should convert a single number into a single token" {
        $tokenizer = [Tokenizer]::new()
    
        $tokens = $tokenizer.interpret("42")
    
        $tokens[0] | Should be "42"
      }
    }

Run your (now failing tests):

    PS C:\Users\Brett\shunting_yard_algorithm> Invoke-Pester
    Executing all tests in '.'
    
    Executing script C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1
    
      Describing Tokenizing an in-fix expression
        [-] Should convert a single number into a single token 58ms
          RuntimeException: Unable to find type [Tokenizer].
          at <ScriptBlock>, C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1: line 4
    Tests completed in 58ms
    Tests Passed: 0, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0

This failed, and it seems it failed for a reasonable reason. It doesn’t know about the type Tokenizer. To remedy this, we’ll create a module and import it into the test.

Create a new filed called Tokenizer.psm1:

    class Tokenizer {
    }

At the top of Tokenizer.Tests.ps1, add the following line:

    using module '.\Tokenizer.psm1'
    
    Describe "Tokenizing an in-fix expression" {
    # ...

Run your tests and see that the error has changed a bit:

    PS C:\Users\Brett\shunting_yard_algorithm> Invoke-Pester
    Executing all tests in '.'
    
    Executing script C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1
    
      Describing Tokenizing an in-fix expression
        [-] Should convert a single number into a single token 69ms
          RuntimeException: Method invocation failed because [Tokenizer] does not contain a method named 'interpret'.
          at <ScriptBlock>, C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1: line 8
    Tests completed in 69ms
    Tests Passed: 0, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0

Closer, let’s get to a passing test. Again, we’ll do just enough to get the test passing.

Update Tokenizer.psm1:

    using namespace System.Collections
    
    class Tokenizer {
      [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()
        $result.Add($expression)
        return $result
      }
    }

Run your test (expecting things to pass):

    PS C:\Users\Brett\shunting_yard_algorithm> Invoke-Pester
    Executing all tests in '.'
    
    Executing script C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1
    
      Describing Tokenizing an in-fix expression
        [-] Should convert a single number into a single token 47ms
          RuntimeException: Method invocation failed because [Tokenizer] does not contain a method named 'interpret'.
          at <ScriptBlock>, C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1: line 8
    Tests completed in 47ms
    Tests Passed: 0, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0

This might be unexpected. If so, that’s good because when unexpected thigns happen, we’re about to learn something.

In this case, running Pester directly as we are updates the current shell. There are several solutions, but a trivial one is to run “powershell invoke-pester”:

    PS C:\Users\Brett\shunting_yard_algorithm> powershell invoke-pester
    Executing all tests in '.'
    
    Executing script C:\Users\Brett\shunting_yard_algorithm\Tokenizer.Tests.ps1
    
      Describing Tokenizing an in-fix expression
        [+] Should convert a single number into a single token 683ms
    Tests completed in 683ms
    Tests Passed: 1, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0

Summary

There are many common complaints about TDD in such a simple start:

This doesn’t do anything
How can such small steps accomplish anything?
I know so much more about what I need to do, why don’t I jump ahead and save time?
… Insert another 20 complaints here.

I’m not going to even try to convince you that this does or does not work. We’ll work through the problem taking an extremely incremental approach. We’ll build up a solid footing so we can experiment later. Even so, this trivial example has demonstrated several (possibly incorrect) decisions:

We have a class that does this work called Tokenizer
We have a single method that we call, called tokenize
It takes a String and returns an array of Strings, one element for each token

Now that we have an API, we can focus on trying to grow the algorithm to make it work with at least the examples listed above.

Initialize And Initial Push

Make your shunting_yard_algorithm directory a git repo

    PS C:\Users\Brett\shunting_yard_algorithm> git init
    Initialized empty Git repository in C:/Users/Brett/shunting_yard_algorithm/.git/

Now add all the things:

    PS C:\Users\Brett\shunting_yard_algorithm> git add .\Tokenizer.*

Verify only the things we want to add have been added:

    PS C:\Users\Brett\shunting_yard_algorithm> git status
    On branch master
    
    No commits yet
    
    Changes to be committed:
      (use "git rm --cached <file>..." to unstage)
    
            new file:   Tokenizer.Tests.ps1
            new file:   Tokenizer.psm1

Make your first commit into your local git repo:

    PS C:\Users\Brett\shunting_yard_algorithm> git commit -m "Initial commit"

And look at the results:

    [master (root-commit) 94fee3e] Initial commit
     2 files changed, 17 insertions(+)
     create mode 100644 Tokenizer.Tests.ps1
     create mode 100644 Tokenizer.psm1

You can verify that there are no local changes remaining:

    PS C:\Users\Brett\shunting_yard_algorithm> git status
    On branch master
    nothing to commit, working tree clean

Up Next–>

title: PowerShell5-Tokenize_Expression-Simple_Binary_Expressions — Previous Up Next

Simple Binary Expression

Now that we have a trivial first test, we’ll begin growing the implementation one test at a time. We’ll be following Uncle Bob’s Three Rules of TDD, summarized here as:

Write no production code without a failing test
Write just enough of a test to get the test to fail
Write just enough production code to get the test to pass

We will additionally do the following:

Keep existing tests passing while making the new ones pass
Keep the code clean by refactoring it every so often
Frequently committing code using git

In general, there are for things we might do at any point:

Write test code
Write production code
Refactor test code
Refactor production code

We’ll strive to do only one of these at a time and only switch to another one of these actions when all tests are passing.

Moving Towards Binary Expression

Rather than immediately going to a full binary expression, we’ll add a test with a number and an operator.

Create a new test:

     It "Should convert a number and a single operator to two tokens" {
       $tokenizer = [Tokenizer]::new()
    
       $tokens = $tokenizer.interpret("123+")
    
       $tokens[0] | Should be '123'
       $tokens[1] | Should be '+'
       $tokens.Count | Should be 2
     }

Run your tests, you should see an error similar to:

  Describing Tokenizing an in-fix expression
    [+] Should convert a single number into a single token 530ms
    [-] Should convert a number and a single operator to two tokens 150ms
      Expected string length 3 but was 4. Strings differ at index 3.
      Expected: {123}
      But was:  {123+}
      --------------^
      18:        $tokens[0] | Should be '123'
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 18
Tests completed in 681ms
Tests Passed: 1, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0

A little bit of regex magic allows us to get the tests passing:

 [ArrayList]interpret([String]$expression) {
   $result = [ArrayList]::new()

   $expression -match('^(\d+)')
   $result.Add($Matches[1])
   $expression = $expression.Substring($Matches[1].Length)
   $result.Add($expression)

   return $result
 }

Run your tests, they are passing. When I wrote that code, I was tempted to add a check to verify something I suspected. Rather than do that, with all of the tests passing, I suggest going back to the first test and making a change. There’s near duplication between the two tests. Make them more similar.
Update the first test:

       $tokens[0] | Should be '42'
       $tokens.Count | Should be 1

Run the tests, you’ll notice now that the first test fails:

  Describing Tokenizing an in-fix expression
    [-] Should convert a single number into a single token 597ms
      Expected: {1}
      But was:  {2}
      11:        $tokens.Count | Should be 1
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 11
    [+] Should convert a number and a single operator to two tokens 204ms

Suspicion confirmed, update the code:

        if ($expression.Length -ne 0) {
            $result.Add($expression)
        }

        return $result

Run your tests, and it’s back to passing. Now we might be ready for a complete binary expression. Let’s give that a try, then we’ll do some refactoring of the tests.
Add a new test:

     It "Should convert a binary expression into three tokens" {
       $tokenizer = [Tokenizer]::new()
    
       $tokens = $tokenizer.interpret("99*34")
    
       $tokens[0] | Should be '99'
       $tokens[1] | Should be '*'
       $tokens[2] | Should be '34'
       $tokens.Count | Should be 3
     }

Sure enough, running the tests shows that we’re not quite done with a binary expression:

    [-] Should convert a binary expression into three tokens 155ms
      Expected string length 1 but was 3. Strings differ at index 1.
      Expected: {*}
      But was:  {*34}
      ------------^
      30:        $tokens[1] | Should be '*'
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 30

We need to do this more than once, and check for digits and not digits. This will do it:

    [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()

        while ($expression.Length -ne 0) {
            if($expression -match ('^(\d+)')) {
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }  else {
                $expression -match('^([^\d])')
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }
        }
        if ($expression.Length -ne 0) {
            $result.Add($expression)
        }

        return $result
    }

Run your tests, they should pass.
A quick check after your tests are passing will verify that the final if is not necessary:

    [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()

        while ($expression.Length -ne 0) {
            if($expression -match ('^(\d+)')) {
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }  else {
                $expression -match('^([^\d])')
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }
        }

        return $result
    }

Run your tests, they should be passing.
Now is a great time to commit your changes because we are going to refactor and if things go badly, this is a good place to be able to easily get back to.
Noticing some duplication and after a two successful tries, I ended up with the following:

    using namespace System.Collections
    
    class Tokenizer {
        [boolean]recordIfMatches([ref]$expression, $regex, $result) {
            if ($expression.Value -match ($regex)) {
                $result.Add($Matches[1])
                $expression.Value = $expression.Value.Substring($Matches[1].Length)
                return $true
            }
            return $false
        }
    
        [ArrayList]interpret([String]$expression) {
            $result = [ArrayList]::new()
    
            while ($expression.Length -ne 0) {
                if (-not $this.recordIfMatches([ref]$expression, '^(\d+)', $result)) {
                    $this.recordIfMatches([ref]$expression, '^([^\d])', $result)
                }
            }
    
            return $result
        }
    }

I’m not a PowerShell expert and I do not know how common/popular/idiomatic the use of [ref] is, but it nicely collapses the code. I even notice something that will come up later (I see a pattern I’ve not noticed before). So I’ll chose some tests to exploit that. But before doing that, thre’s a few more things to do with our tests in terms of refactoring and test cases.

The test file has a bit of duplication. It’s time to collapse that. To do so, we’ll use the -TestCases feature of Pester.

Update your test by adding a new test, which duplicates the first test:

    It "Should convert <expression> to <expected>" -TestCases @(
        @{expression = '42'; expected = @('42')}
    ) {
        param($expression, $expected)
        $tokenizer = [Tokenizer]::new()
    
        $result = $tokenizer.interpret($expression)

        for($i = 0; $i -lt $result.Count; ++$i) {
            $result[$i] | Should be $expected[$i]
        }
        $result.Count | Should be $result.Count
    }

Run your tests, and they all pass:

      Describing Tokenizing an in-fix expression
        [+] Should convert a single number into a single token 535ms
        [+] Should convert a number and a single operator to two tokens 78ms
        [+] Should convert a binary expression into three tokens 23ms
        [+] Should convert 42 to 42 54ms
    Tests completed in 691ms
    Tests Passed: 4, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0

This new tests duplicates the first test, so it is safe to remove it. While you are at it, convert the other two tests into this last one:

    using module '.\Tokenizer.psm1'
    
    Describe "Tokenizing an in-fix expression" {
        It "Should convert <expression> to <expected>" -TestCases @(
            @{expression = '42'; expected = @('42')}
            @{expression = '123+'; expected = @('123', '+')}
            @{expression = '99*34'; expected = @('99', '*', '34')}
        ) {
            param($expression, $expected)
            $tokenizer = [Tokenizer]::new()
        
            $result = $tokenizer.interpret($expression)
    
            for ($i = 0; $i -lt $result.Count; ++$i) {
                $result[$i] | Should be $expected[$i]
            }
            $result.Count | Should be $result.Count
        }
    }

Now back to checking/extending the behavior. Let’s make sure our code handles white space, and more than one operator, multi-character operators, and even variables.

Add a new test case:

        @{expression = '1+2+3+4'; expected = @('1', '+', '2', '+', '3', '+', '4')}

Run the tests, this seems to work fine. The while loop covers an expression as long as we need.
Now let’s make our code handle variables. Add another test case:

        @{expression = 'a'; expected = @('a')}

I was initially surprised this worked, but that’s the issue with regular expressions. They are often more flexible than you at first realize. The letters are not digits, which is how we are checking for digits versus operators. Knowing this, I’m going to change that test case to a multi-letter variable because I want to start with a broken test.

        @{expression = 'foo+bar'; expected = @('foo', '+', 'bar')}

This fails as I expected:

    [-] Should convert foo+bar to foo = bar 88ms
      Expected string length 3 but was 1. Strings differ at index 1.
      Expected: {foo}
      But was:  {f}
      ------------^
      18:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 18

Here’s a quick fix to make this work:

            if (-not $this.recordIfMatches([ref]$expression, '^([\d\w]+)', $result)) {

Note that the regular expression was simply \d+ and now it is [\d\w]+. This might seem too clever, too simple or maybe it seems like I’m cheating. In fact, if that’s the case, then to “prove” I’m cheating, you want to find a test that will cause my code to break. However, I’m fine with that solution for now. If this were a real problem, I think I’d have a known list of operators and check for them explicitly. However, this is a simple example and so I’m OK with simple tests and simple solutions.

What about a multi-character operator? Add a new test:

        @{expression = '++foo'; expected = @('++', 'foo')}

Run your tests, and this fails.

    [-] Should convert ++foo to ++ foo 80ms
      Expected string length 2 but was 1. Strings differ at index 1.
      Expected: {++}
      But was:  {+}
      ------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19

Update the regular expression to fix this:

                $this.recordIfMatches([ref]$expression, '^([^\d]+)', $result)

When I run the tests, I find my confidence was too high:

    [-] Should convert foo+bar to foo + bar 78ms
      Expected string length 1 but was 4. Strings differ at index 1.
      Expected: {+}
      But was:  {+bar}
      ------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19
    [-] Should convert ++foo to ++ foo 76ms
      Expected string length 2 but was 5. Strings differ at index 2.
      Expected: {++}
      But was:  {++foo}
      -------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19

The second regular expression was originally the opposite of the first, but we added \w to the first, so we need to do the same for the second one:

                $this.recordIfMatches([ref]$expression, '^([^\d\w]+)', $result)

That’s a good lesson. I though it was simple, and it was, just not in the way I though. My tests allowed me to experiment, learn, adjust and make progress.

Now is probably a good time to commit your changes.

Now let’s handle white space. We can match white space much like we do everything else, or we could simply remove it. Regardless, a test will keep us obvious. I think this is going to be simple, so I’ll start with a “big” test:

        @{expression = '   foo  + -bar  = baz   '; expected = @('foo', '+', '-', 'bar', '=', 'baz')}

Once again, my confidence has bitten me. I figured I could simply replace all of the white space:

        $expression = $expression -replace('\s+','')

Close, but no cigar:

    [-] Should convert    foo  + -bar  = baz    to foo + - bar = baz 83ms
      Expected string length 1 but was 2. Strings differ at index 1.
      Expected: {+}
      But was:  {+-}
      ------------^
      20:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 20

Here’s a second attempt:

        while ($expression.Length -ne 0) {
            $expression = $expression -replace ('^\s+', '')
            if (-not $this.recordIfMatches([ref]$expression, '^([\d\w]+)', $result)) {
                $this.recordIfMatches([ref]$expression, '^([^\d\w\s]+)', $result)
            }
        }

There are three changes. First, the regular expression is in the while loop. Second, it only matches at the beginning of the string, third, the bottom regular expression is also excluding \s (white space characters). However, those changes results in passing tests:

  Describing Tokenizing an in-fix expression
    [+] Should convert 42 to 42 577ms
    [+] Should convert 123+ to 123 + 84ms
    [+] Should convert 99*34 to 99 * 34 75ms
    [+] Should convert 1+2+3+4 to 1 + 2 + 3 + 4 22ms
    [+] Should convert a to a 13ms
    [+] Should convert foo+bar to foo + bar 14ms
    [+] Should convert ++foo to ++ foo 16ms
    [+] Should convert    foo  + -bar  = baz    to foo + - bar = baz 25ms
Tests completed in 829ms
Tests Passed: 8, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0

This seems like enough progress on binary expressions. Next up, handling parenthesis. Previous Up Next

title: PowerShell5-Tokenize_Expression-First_Stab_At_Parentheses — Previous Up Next

First Stab at parenthesis

There are two ways in which our tokenizer might encounter parenthesis. The first is to group a lower-precedence operator, as in:

(3 + 4) * 6

The next is in the use of function calls.:

f(4)

For now, we want to pull out the () as tokens. We’ll allow something at a higher level to determine the context.

Create a first test to see what happens when we use (:

        @{expression = '(a)'; expected = @('(', 'a', ')')}

Interesting, this works. I’m skeptical that we’ve got it working as needed. Knowing a little bit about the code, I’ll add another test:

        @{expression = '(())'; expected = @('(', '(',')', ')')}

There’s the failure I’m expecting:

    [-] Should convert (()) to ( ( ) ) 102ms
      Expected string length 1 but was 4. Strings differ at index 1.
      Expected: {(}
      But was:  {(())}
      ------------^
      22:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 22

Now it’s time to update the code just a touch. Rather than changing how to handle operators, I’ll add another match:

            if (-not $this.recordIfMatches([ref]$expression, '^([()])', $result)) {
                    if (-not $this.recordIfMatches([ref]$expression, '^([\d\w]+)', $result)) {
                        $this.recordIfMatches([ref]$expression, '^([^\d\w\s]+)', $result)
                    }
                }
            }

Run your tests, that seems to work.
Now is a good time to commit your code. After this, it seemed like there was a pattern in the code that I could represent with an array and a loop. Here’s another version that also works. I’m not sure if I like this better or not.

    static [Array]$regex = @( '^([()])', '^([\d\w]+)', '^([^\d\w\s]+)' )
    [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()

        while ($expression.Length -ne 0) {
            $expression = $expression -replace ('^\s+', '')
            foreach ($r in [Tokenizer]::regex) {
                if ($this.recordIfMatches([ref]$expression, $r, $result)) {
                    break
                }
            }
        }

        return $result
    }

Previous Up Next

title: PowerShell5-Tokenize_Expression-Function_Calls — Previous Up Next

Function Calls

Finally, function calls. This might already work. Let’s write an experiment and see what the results are:

        @{expression = 'f(g(3))'; expected = @('f', '(', 'g', '(', '3', ')', ')')}

This passes. The question I’m wondering is whether we want those results, or something more like this:

        @{expression = 'f(g(3))'; expected = @('f(', 'g(', '3', ')', ')')}

As it turns out, the first version passes as is. The second version requires a change to one of the regular expressions:

    static [Array]$regex = @( '^([()])', '^([\d\w]+\({0,1})', '^([^\d\w\s]+)' )

The change is in the middle regex, allowing for zero or 1 ( at the end of a series of digits and letters. Given the change is easy, I’ll leave this as is and consider the tokenizer done for now. Next, we’ll move on to the Shunting Yard Algorithm in PowerShell 5.

Here’s my final-ish version.. Previous Up Next

title: PowerShell5-Tokenize_Expression-Convert_Tokenizer_To_An_Enumerator — Previous Up Next

The Tokenizer converts a whole expression into an array of tokens. Now we’ll convert it to an Enumerator.

Convert Tokenizer to Enumerator

We are going to convert this in place while maintaining the tests.

Add Required Interfaces

Add the interfaces to the class:

class Tokenizer : IEnumerable, IEnumerator

Run your tests. They fail due to missing required methods.
Add each of the following methods stubbed out to get our existing tests running again:

    [IEnumerator]GetEnumerator() {
        return $this
    }
    
    [bool]MoveNext() {
        return $false
    }
    
    [Object]get_Current() {
        return $null
    }
    
    [void]Reset() {
    }

Run your tests, they now should be back to passing. Next, we’ll add a new test that uses the Tokenizer as an iterator and get it passing.
Add only the first test to keep this as simple as possible:

    It "Should enummerate <expression> into <expected>" -TestCase @(
        @{expression = '42'; expected = @('42')}
    ) {
        param($expression, $expected)
        $tokenizer = [Tokenizer]::new($expression)

        for($i = 0; $i -lt $expected.Count; ++$i) {
            $tokenizer.MoveNext()
            $tokenizer.Current | Should be $expected[$i]
        }
        $tokenizer.MoveNext() | Should be $false
    }

Now write just enough of the interface method to get this test passing:

    [String]$currentExpression
    Tokenizer($expression) {
        $this.currentExpression = $expression
    }

    [IEnumerator]GetEnumerator() {
        return $this
    }

    [bool]MoveNext() {
        return $false
    }

    [Object]get_Current() {
        return $this.currentExpression
    }

    [void]Reset() {
    }

There are a few things to note in this first version:

We used a constructor in the new test that takes in the expression and stores it. Adding a constructor taking a single argument will make PowerShell remove the default no-argument constructor. To keep the tests passing, we add in an empty no-argument constructor as well as a one-agument constructor. We’re migrating this code so this is an intermediate form. When we’ve finished converting this from its original form to an enumerator, it will no longer need the no-argument constructor.
The property get_Current needs something to return. That’s what $this.currentExpression is. It’s assigned in the one-argument constructor. That’s fine for now. As we add more tests, this will change.
Run your tests, they should pass.
Now, we copy the second test case and work on getting it to pass as well:

    It "Should enummerate <expression> into <expected>" -TestCase @(
        @{expression = '42'; expected = @('42')}
        @{expression = '123+'; expected = @('123', '+')}
    ) {

Run your tests, they fail:

    [-] Should enummerate 123+ into 123 + 92ms
      Expected string length 3 but was 4. Strings differ at index 3.
      Expected: {123}
      But was:  {123+}
      --------------^
      37:             $tokenizer.Current | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 37

Here are a few changes to make that work. Notice that some of this code is copied from the interpret method.

    [String]$currentExpression
    [String]$currentToken

    [bool]MoveNext() {
        $this.currentToken = $null

        foreach ($r in [Tokenizer]::REGEX) {
            if($this.currentExpression -match $r) {
                $this.currentToken = $Matches[1]
                $this.currentExpression = $this.currentExpression.Substring($this.currentToken.Length)
                break
            }
        }
        return $this.currentExpression.Length -gt 0
    }

    [Object]get_Current() {
        return $this.currentToken
    }

Run your tests, they should all pass.
Add the next test:

        @{expression = '99*34'; expected = @('99', '*', '34')}

Run your tests, they all pass.
Add all of the remaining tests:

    It "Should enummerate <expression> into <expected>" -TestCase @(
        @{expression = '42'; expected = @('42')}
        @{expression = '123+'; expected = @('123', '+')}
        @{expression = '99*34'; expected = @('99', '*', '34')}
        @{expression = '1+2+3+4'; expected = @('1', '+', '2', '+', '3', '+', '4')}
        @{expression = 'a'; expected = @('a')}
        @{expression = 'foo+bar'; expected = @('foo', '+', 'bar')}
        @{expression = '++foo'; expected = @('++', 'foo')}
        @{expression = '   foo  + -bar  = baz   '; expected = @('foo', '+', '-', 'bar', '=', 'baz')}
        @{expression = '(a)'; expected = @('(', 'a', ')')}
        @{expression = '(())'; expected = @('(', '(', ')', ')')}
        @{expression = 'f(g(3))'; expected = @('f(', 'g(', '3', ')', ')')}
    ) {
        param($expression, $expected)
        $tokenizer = [Tokenizer]::new($expression)

        for($i = 0; $i -lt $expected.Count; ++$i) {
            $tokenizer.MoveNext()
            $tokenizer.Current | Should be $expected[$i]
        }
        $tokenizer.MoveNext() | Should be $false
    }

The only test failing deals with spaces in the expression:

    [+] Should enummerate ++foo into ++ foo 15ms
    [-] Should enummerate    foo  + -bar  = baz    into foo + - bar = baz 84ms
      Expected string length 3 but was 0. Strings differ at index 0.
      Expected: {foo}
      But was:  {}
      -----------^
      46:             $tokenizer.Current | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 46
    [+] Should enummerate (a) into ( a ) 69ms

Add the missing line into MoveNext (right before the foreach):

        $this.currentExpression = $this.currentExpression -replace ('^\s+', '')

Run your tests, and all tests pass.
Now we can remove the first test and the original methods:

    using module '.\Tokenizer.psm1'
    
    Describe "Tokenizing an in-fix expression" {
        It "Should enummerate <expression> into <expected>" -TestCase @(
            @{expression = '42'; expected = @('42')}
            @{expression = '123+'; expected = @('123', '+')}
            @{expression = '99*34'; expected = @('99', '*', '34')}
            @{expression = '1+2+3+4'; expected = @('1', '+', '2', '+', '3', '+', '4')}
            @{expression = 'a'; expected = @('a')}
            @{expression = 'foo+bar'; expected = @('foo', '+', 'bar')}
            @{expression = '++foo'; expected = @('++', 'foo')}
            @{expression = '   foo  + -bar  = baz   '; expected = @('foo', '+', '-', 'bar', '=', 'baz')}
            @{expression = '(a)'; expected = @('(', 'a', ')')}
            @{expression = '(())'; expected = @('(', '(', ')', ')')}
            @{expression = 'f(g(3))'; expected = @('f(', 'g(', '3', ')', ')')}
        ) {
            param($expression, $expected)
            $tokenizer = [Tokenizer]::new($expression)
    
            for($i = 0; $i -lt $expected.Count; ++$i) {
                $tokenizer.MoveNext()
                $tokenizer.Current | Should be $expected[$i]
            }
            $tokenizer.MoveNext() | Should be $false
        } 
    }

Also, remove the old code from the Tokenizer:

    using namespace System.Collections
    
    class Tokenizer : IEnumerable, IEnumerator {
        static $PARENTHESIS = '^([()])' 
        static $NUMBERS_WORDS_FUNCTIONS = '^([\d\w]+\({0,1})'
        static $OPERATORS = '^([^\d\w\s]+)'
        static [Array]$REGEX = @( [Tokenizer]::PARENTHESIS, [Tokenizer]::NUMBERS_WORDS_FUNCTIONS, [Tokenizer]::OPERATORS )
    
        [String]$currentExpression
        [String]$currentToken
    
        Tokenizer($expression) {
            $this.currentExpression = $expression
        }
    
        [IEnumerator]GetEnumerator() {
            return $this
        }
    
        [bool]MoveNext() {
            $this.currentToken = $null
    
            $this.currentExpression = $this.currentExpression -replace ('^\s+', '')
            foreach ($r in [Tokenizer]::REGEX) {
                if ($this.currentExpression -match $r) {
                    $this.currentToken = $Matches[1]
                    $this.currentExpression = $this.currentExpression.Substring($this.currentToken.Length)
                    break
                }
            }
            return $this.currentExpression.Length -gt 0
        }
    
        [Object]get_Current() {
            return $this.currentToken
        }
    
        [void]Reset() {
        }
    }

Notice that we have no tests for Reset? It is required to get the code to run but we don’t use it in a test. Time to add a missing test and write its implementation.

Add one final test:

    It "Should be possible to go through the results after a reset" {
        $tokenizer = [Tokenizer]::new("42")
        $tokenizer.MoveNext()
        $tokenizer.Current | Should be "42"
        $tokenizer.Reset()
        $tokenizer.MoveNext()
        $tokenizer.Current | Should be "42"
    }

Run the test, it fails:

    [-] Should be possible to go through the results after a reset 81ms
      Expected string length 2 but was 0. Strings differ at index 0.
      Expected: {42}
      But was:  {}
      -----------^
      33:         $tokenizer.Current | Should be "42"
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 33

Update Tokenizer to store the original expression in the constructor and implement the reset method.

    [String]$currentExpression
    [String]$currentToken
    [String]$originalExpression

    Tokenizer($expression) {
        $this.originalExpression = $expression
        $this.Reset()
    }
# ...
    [void]Reset() {
        $this.currentExpression = $this.originalExpression
    }

Run your tests, they all pass.

Previous Up Next

title: PowerShell5-Tokenize_Expression-Finalish_Version — Previous Up

Final-ish Version

After the adding support for functions, I figured that was enough. However, once I looked at the final version, I found one thing worth cleaning up. The regular expressions are a bit confusing, but I’m OK with them. However, giving them names would clean up the code a touch for the next person (probably me) that had to support it. Here’s a quick refactoring of that:

    static $PARENTHESIS ='^([()])' 
    static $NUMBERS_WORDS_FUNCTIONS = '^([\d\w]+\({0,1})'
    static $OPERATORS = '^([^\d\w\s]+)'
    static [Array]$REGEX = @( [Tokenizer]::PARENTHESIS, [Tokenizer]::NUMBERS_WORDS_FUNCTIONS, [Tokenizer]::OPERATORS )

After that, I decided to migrate the example to an Enumerator. This is that version.

Here are the most-recent final versions of these two files.

Tokenizer.Tests.ps1

using module '.\Tokenizer.psm1'

Describe "Tokenizing an in-fix expression" {
    It "Should enummerate <expression> into <expected>" -TestCase @(
        @{expression = '42'; expected = @('42')}
        @{expression = '123+'; expected = @('123', '+')}
        @{expression = '99*34'; expected = @('99', '*', '34')}
        @{expression = '1+2+3+4'; expected = @('1', '+', '2', '+', '3', '+', '4')}
        @{expression = 'a'; expected = @('a')}
        @{expression = 'foo+bar'; expected = @('foo', '+', 'bar')}
        @{expression = '++foo'; expected = @('++', 'foo')}
        @{expression = '   foo  + -bar  = baz   '; expected = @('foo', '+', '-', 'bar', '=', 'baz')}
        @{expression = '(a)'; expected = @('(', 'a', ')')}
        @{expression = '(())'; expected = @('(', '(', ')', ')')}
        @{expression = 'f(g(3))'; expected = @('f(', 'g(', '3', ')', ')')}
    ) {
        param($expression, $expected)
        $tokenizer = [Tokenizer]::new($expression)

        for($i = 0; $i -lt $expected.Count; ++$i) {
            $tokenizer.MoveNext()
            $tokenizer.Current | Should be $expected[$i]
        }
        $tokenizer.MoveNext() | Should be $false
    } 

    It "Should be possible to go through the results after a reset" {
        $tokenizer = [Tokenizer]::new("42")
        $tokenizer.MoveNext()
        $tokenizer.Current | Should be "42"
        $tokenizer.Reset()
        $tokenizer.MoveNext()
        $tokenizer.Current | Should be "42"
    }
}

Tokenizer.psm1

using namespace System.Collections

class Tokenizer : IEnumerable, IEnumerator {
    static $PARENTHESIS = '^([()])' 
    static $NUMBERS_WORDS_FUNCTIONS = '^([\d\w]+\({0,1})'
    static $OPERATORS = '^([^\d\w\s]+)'
    static [Array]$REGEX = @( [Tokenizer]::PARENTHESIS, [Tokenizer]::NUMBERS_WORDS_FUNCTIONS, [Tokenizer]::OPERATORS )

    [String]$currentExpression
    [String]$currentToken
    [String]$originalExpression

    Tokenizer($expression) {
        $this.originalExpression = $expression
        $this.Reset()
    }

    [IEnumerator]GetEnumerator() {
        return $this
    }

    [bool]MoveNext() {
        $this.currentToken = $null

        $this.currentExpression = $this.currentExpression -replace ('^\s+', '')
        foreach ($r in [Tokenizer]::REGEX) {
            if ($this.currentExpression -match $r) {
                $this.currentToken = $Matches[1]
                $this.currentExpression = $this.currentExpression.Substring($this.currentToken.Length)
                break
            }
        }
        return $this.currentExpression.Length -gt 0
    }

    [Object]get_Current() {
        return $this.currentToken
    }

    [void]Reset() {
        $this.currentExpression = $this.originalExpression
    }
}

Previous Up