Port of schuchert.wikispaces.com


PowerShell5-Tokenize_Expression-Simple_Binary_Expressions

PowerShell5-Tokenize_Expression-Simple_Binary_Expressions

Up

Simple Binary Expression

Now that we have a trivial first test, we’ll begin growing the implementation one test at a time. We’ll be following Uncle Bob’s Three Rules of TDD, summarized here as:

We will additionally do the following:

In general, there are for things we might do at any point:

We’ll strive to do only one of these at a time and only switch to another one of these actions when all tests are passing.

Moving Towards Binary Expression

Rather than immediately going to a full binary expression, we’ll add a test with a number and an operator.

     It "Should convert a number and a single operator to two tokens" {
       $tokenizer = [Tokenizer]::new()
    
       $tokens = $tokenizer.interpret("123+")
    
       $tokens[0] | Should be '123'
       $tokens[1] | Should be '+'
       $tokens.Count | Should be 2
     }
  Describing Tokenizing an in-fix expression
    [+] Should convert a single number into a single token 530ms
    [-] Should convert a number and a single operator to two tokens 150ms
      Expected string length 3 but was 4. Strings differ at index 3.
      Expected: {123}
      But was:  {123+}
      --------------^
      18:        $tokens[0] | Should be '123'
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 18
Tests completed in 681ms
Tests Passed: 1, Failed: 1, Skipped: 0, Pending: 0, Inconclusive: 0
 [ArrayList]interpret([String]$expression) {
   $result = [ArrayList]::new()

   $expression -match('^(\d+)')
   $result.Add($Matches[1])
   $expression = $expression.Substring($Matches[1].Length)
   $result.Add($expression)

   return $result
 }
       $tokens[0] | Should be '42'
       $tokens.Count | Should be 1
  Describing Tokenizing an in-fix expression
    [-] Should convert a single number into a single token 597ms
      Expected: {1}
      But was:  {2}
      11:        $tokens.Count | Should be 1
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 11
    [+] Should convert a number and a single operator to two tokens 204ms
        if ($expression.Length -ne 0) {
            $result.Add($expression)
        }

        return $result
     It "Should convert a binary expression into three tokens" {
       $tokenizer = [Tokenizer]::new()
    
       $tokens = $tokenizer.interpret("99*34")
    
       $tokens[0] | Should be '99'
       $tokens[1] | Should be '*'
       $tokens[2] | Should be '34'
       $tokens.Count | Should be 3
     }
    [-] Should convert a binary expression into three tokens 155ms
      Expected string length 1 but was 3. Strings differ at index 1.
      Expected: {*}
      But was:  {*34}
      ------------^
      30:        $tokens[1] | Should be '*'
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 30
    [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()

        while ($expression.Length -ne 0) {
            if($expression -match ('^(\d+)')) {
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }  else {
                $expression -match('^([^\d])')
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }
        }
        if ($expression.Length -ne 0) {
            $result.Add($expression)
        }

        return $result
    }
    [ArrayList]interpret([String]$expression) {
        $result = [ArrayList]::new()

        while ($expression.Length -ne 0) {
            if($expression -match ('^(\d+)')) {
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }  else {
                $expression -match('^([^\d])')
                $result.Add($Matches[1])
                $expression = $expression.Substring($Matches[1].Length)
            }
        }

        return $result
    }
    using namespace System.Collections
    
    class Tokenizer {
        [boolean]recordIfMatches([ref]$expression, $regex, $result) {
            if ($expression.Value -match ($regex)) {
                $result.Add($Matches[1])
                $expression.Value = $expression.Value.Substring($Matches[1].Length)
                return $true
            }
            return $false
        }
    
        [ArrayList]interpret([String]$expression) {
            $result = [ArrayList]::new()
    
            while ($expression.Length -ne 0) {
                if (-not $this.recordIfMatches([ref]$expression, '^(\d+)', $result)) {
                    $this.recordIfMatches([ref]$expression, '^([^\d])', $result)
                }
            }
    
            return $result
        }
    }

I’m not a PowerShell expert and I do not know how common/popular/idiomatic the use of [ref] is, but it nicely collapses the code. I even notice something that will come up later (I see a pattern I’ve not noticed before). So I’ll chose some tests to exploit that. But before doing that, thre’s a few more things to do with our tests in terms of refactoring and test cases.

The test file has a bit of duplication. It’s time to collapse that. To do so, we’ll use the -TestCases feature of Pester.

    It "Should convert <expression> to <expected>" -TestCases @(
        @{expression = '42'; expected = @('42')}
    ) {
        param($expression, $expected)
        $tokenizer = [Tokenizer]::new()
    
        $result = $tokenizer.interpret($expression)

        for($i = 0; $i -lt $result.Count; ++$i) {
            $result[$i] | Should be $expected[$i]
        }
        $result.Count | Should be $result.Count
    }
      Describing Tokenizing an in-fix expression
        [+] Should convert a single number into a single token 535ms
        [+] Should convert a number and a single operator to two tokens 78ms
        [+] Should convert a binary expression into three tokens 23ms
        [+] Should convert 42 to 42 54ms
    Tests completed in 691ms
    Tests Passed: 4, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0
    using module '.\Tokenizer.psm1'
    
    Describe "Tokenizing an in-fix expression" {
        It "Should convert <expression> to <expected>" -TestCases @(
            @{expression = '42'; expected = @('42')}
            @{expression = '123+'; expected = @('123', '+')}
            @{expression = '99*34'; expected = @('99', '*', '34')}
        ) {
            param($expression, $expected)
            $tokenizer = [Tokenizer]::new()
        
            $result = $tokenizer.interpret($expression)
    
            for ($i = 0; $i -lt $result.Count; ++$i) {
                $result[$i] | Should be $expected[$i]
            }
            $result.Count | Should be $result.Count
        }
    }

Now back to checking/extending the behavior. Let’s make sure our code handles white space, and more than one operator, multi-character operators, and even variables.

        @{expression = '1+2+3+4'; expected = @('1', '+', '2', '+', '3', '+', '4')}
        @{expression = 'a'; expected = @('a')}

I was initially surprised this worked, but that’s the issue with regular expressions. They are often more flexible than you at first realize. The letters are not digits, which is how we are checking for digits versus operators. Knowing this, I’m going to change that test case to a multi-letter variable because I want to start with a broken test.

        @{expression = 'foo+bar'; expected = @('foo', '+', 'bar')}
    [-] Should convert foo+bar to foo = bar 88ms
      Expected string length 3 but was 1. Strings differ at index 1.
      Expected: {foo}
      But was:  {f}
      ------------^
      18:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 18
            if (-not $this.recordIfMatches([ref]$expression, '^([\d\w]+)', $result)) {

Note that the regular expression was simply \d+ and now it is [\d\w]+. This might seem too clever, too simple or maybe it seems like I’m cheating. In fact, if that’s the case, then to “prove” I’m cheating, you want to find a test that will cause my code to break. However, I’m fine with that solution for now. If this were a real problem, I think I’d have a known list of operators and check for them explicitly. However, this is a simple example and so I’m OK with simple tests and simple solutions.

        @{expression = '++foo'; expected = @('++', 'foo')}
    [-] Should convert ++foo to ++ foo 80ms
      Expected string length 2 but was 1. Strings differ at index 1.
      Expected: {++}
      But was:  {+}
      ------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19
                $this.recordIfMatches([ref]$expression, '^([^\d]+)', $result)
    [-] Should convert foo+bar to foo + bar 78ms
      Expected string length 1 but was 4. Strings differ at index 1.
      Expected: {+}
      But was:  {+bar}
      ------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19
    [-] Should convert ++foo to ++ foo 76ms
      Expected string length 2 but was 5. Strings differ at index 2.
      Expected: {++}
      But was:  {++foo}
      -------------^
      19:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 19
                $this.recordIfMatches([ref]$expression, '^([^\d\w]+)', $result)

That’s a good lesson. I though it was simple, and it was, just not in the way I though. My tests allowed me to experiment, learn, adjust and make progress.

Now let’s handle white space. We can match white space much like we do everything else, or we could simply remove it. Regardless, a test will keep us obvious. I think this is going to be simple, so I’ll start with a “big” test:

        @{expression = '   foo  + -bar  = baz   '; expected = @('foo', '+', '-', 'bar', '=', 'baz')}
        $expression = $expression -replace('\s+','')
    [-] Should convert    foo  + -bar  = baz    to foo + - bar = baz 83ms
      Expected string length 1 but was 2. Strings differ at index 1.
      Expected: {+}
      But was:  {+-}
      ------------^
      20:             $result[$i] | Should be $expected[$i]
      at Invoke-LegacyAssertion, C:\Program Files\WindowsPowerShell\Modules\Pester\4.0.8\Functions\Assertions\Should.ps1: line 190
      at <ScriptBlock>, C:\Users\Brett\src\shunting_yard_powershell_3\Tokenizer.Tests.ps1: line 20
        while ($expression.Length -ne 0) {
            $expression = $expression -replace ('^\s+', '')
            if (-not $this.recordIfMatches([ref]$expression, '^([\d\w]+)', $result)) {
                $this.recordIfMatches([ref]$expression, '^([^\d\w\s]+)', $result)
            }
        }
  Describing Tokenizing an in-fix expression
    [+] Should convert 42 to 42 577ms
    [+] Should convert 123+ to 123 + 84ms
    [+] Should convert 99*34 to 99 * 34 75ms
    [+] Should convert 1+2+3+4 to 1 + 2 + 3 + 4 22ms
    [+] Should convert a to a 13ms
    [+] Should convert foo+bar to foo + bar 14ms
    [+] Should convert ++foo to ++ foo 16ms
    [+] Should convert    foo  + -bar  = baz    to foo + - bar = baz 25ms
Tests completed in 829ms
Tests Passed: 8, Failed: 0, Skipped: 0, Pending: 0, Inconclusive: 0

This seems like enough progress on binary expressions. Next up, handling parenthesis. Up


Comments

" Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.