Parallel PowerShell: Part II

I posted on parallelization in PowerShell a short while back.  Check out that post for a number of references that I won’t be including here.

I ran into a few situations where a few threads would freeze, preventing tasks from running after the parallel code runs.  There is likely a more official way to do this, but here is how I implemented a timeout for each thread.

Using Boe Prox’s code, I hacked together Run-Parallel.  It does the following:

  1. Define Get-RunspaceData, a function that loops through runspaces, cleans up if they are done or over their max runtime
  2. Take in the code we will run against various computers
  3. Create a runspace pool
  4. For each computer to run against
    • Create a PowerShell instance with the scriptblock, add the computer name as an argument
    • Add details (Computer, PowerShell instance, start time, etc.) to an array
    • Run Get-RunspaceData
  5. After all computers queued up, run Get-RunspaceData until everything has completed

Let’s step through the code for each of these starting at (2):

Take in the code we will run against various computers

        #If scriptblock is not specified, convert script file to script block.
        if(! $scriptblock){
            [scriptblock]$scriptblock = [scriptblock]::Create($((get-content $scriptfile) | out-string))
        }
        #if scriptblock is specified, add parameter definition to first line
        else{
            $ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param(`$_)`r`n" + $Scriptblock.ToString())
        }

In this block, we convert $scriptfile into a scriptblock, otherwise we take in $scriptblock and add param($_) to the first line.  This lets you use the Run-Parallel function like a foreach(){} – you just reference the computer as $_.

Create a runspace pool

        $sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
        $runspacepool = [runspacefactory]::CreateRunspacePool(1, $Throttle, $sessionstate, $Host)
        $runspacepool.Open()

In this block, we create a default session state, an create a runspacepool with this state and the throttle parameter (how many to run at once).

On a side note, if you want certain variables or modules to be available to all sessions, this is where you do it. More details here.  For example (this isn’t included in my function):

#EXAMPLE
$sessionstate = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
$sessionstate.ImportPSModule(“ActiveDirectory”)

For each computer

        ForEach ($Computer in $Computers) {
           #Create the powershell instance and supply the scriptblock with the other parameters
           $powershell = [powershell]::Create().AddScript($ScriptBlock).AddArgument($computer)

           #Add the runspace into the powershell instance
           $powershell.RunspacePool = $runspacepool

           #Create a temporary collection for each runspace
           $temp = "" | Select-Object PowerShell,Runspace,Computer,StartTime
           $temp.Computer = $Computer
           $temp.PowerShell = $powershell
           $temp.StartTime = get-date

           #Save the handle output when calling BeginInvoke() that will be used later to end the runspace
           $temp.Runspace = $powershell.BeginInvoke()
           Write-Verbose ("Adding {0} collection" -f $temp.Computer)
           $runspaces.Add($temp) | Out-Null

           Write-Verbose ("Checking status of runspace jobs")
           Get-RunspaceData @runspacehash
        }

In this block, we loop through each computer.

We create the powershell instance (if desired, you can add more arguments if needed by tacking on another .addargument() ), and add it to the pool.

We create $temp, which contains the computer, powershell instance, start time, and handle output from beginInvoke.  We add this object to the $runspaces array that will be used for tracking each runspace.

Finally, we run Get-RunspaceData.  We do this for each computer because we may start getting results before we can build up all the runspaces.

Get-RunspaceData

        Function Get-RunspaceData {
            [cmdletbinding()]
            param(
                [switch]$Wait
            )

            Do {
                #set more to false
                $more = $false

                Write-Progress  -Activity "Running Query"`
                    -Status "Starting threads"`
                    -CurrentOperation "$count threads created - $($runspaces.count) threads open"`
                    -PercentComplete (($totalcount - $runspaces.count) / $totalcount * 100)

                #run through each runspace.
                Foreach($runspace in $runspaces) {

                    $runtime = (get-date) - $runspace.startTime
                    #If runspace completed, end invoke, dispose, recycle, counter++
                    If ($runspace.Runspace.isCompleted) {
                        $runspace.powershell.EndInvoke($runspace.Runspace)
                        $runspace.powershell.dispose()
                        $runspace.Runspace = $null
                        $runspace.powershell = $null
                    }

                    #If runtime exceeds max, dispose the runspace
                    ElseIf ( ( (get-date) - $runspace.startTime ).totalMinutes -gt $maxRunTime) {
                        $runspace.powershell.dispose()
                        $runspace.Runspace = $null
                        $runspace.powershell = $null
                    }

                    #If runspace isn't null set more to true
                    ElseIf ($runspace.Runspace -ne $null) {
                        $more = $true
                    }
                }

                #After looping through runspaces, if more and wait, sleep
                If ($more -AND $PSBoundParameters['Wait']) {
                    Start-Sleep -Milliseconds $SleepTimer
                }

                #Clean out unused runspace jobs
                $temphash = $runspaces.clone()
                $temphash | Where {
                    $_.runspace -eq $Null
                } | ForEach {
                    Write-Verbose ("Removing {0}" -f $_.computer)
                    $Runspaces.remove($_)
                }

            #Stop this loop only when $more if false and wait
            } while ($more -AND $PSBoundParameters['Wait'])

        #End of runspace function
        }

So, now we have $runspaces, an array holding information on all the parallel threads we will be running. When we run Get-RunspaceData, the function loops through $runspaces to check if each runspace has completed or has over run the maxRunTime we specified, cleaning up as necessary

Run Get-RunspaceData until everything has completed

        $runspacehash.Wait = $true
        Get-RunspaceData @runspacehash

We’re at the end! $runspaces contains all (remaining) threads that will need to run. Here, we simply add the -wait parameter for Get-RunspaceData. When this is specified, the function keeps looping until $runspaces is empty.

Closing

That’s about it!  The full script is available here in the Script Repository.  I also added ForEach-Parallel – this adds the same runtime tracking to Tome Tanasovski’s version of Foreach-Parallel.

If anyone has any suggestions on revising this, please let me know!  I assume there would be computational overhead, as get-date runs for every single runspace when looping through $runspacepools.  When you’re querying thousands of computers, this might not the most efficient way to implement timeouts on a runspace.

Update

The method I use to track start time is not accurate.  Runspaces are all defined initially, and queued up in the runspacepool.  This means we could record a starttime, the runspace could wait for a long time if the queue is full, and the timeout would expire the runspace immediately as the starttime was defined at the start.

I changed the default Timeout to 0.  If you do want to use this parameter, set it for roughly the duration you expect all threads will take to complete, and it will provide a simple fail-safe against hung threads.

Boe Prox suggested using nested runspaces – create an intermediary runspace that tracks the runtime of the real runspace.  It sets the timeout accurately, but falls short in performance.  Given that the motivation behind these commands is performance, I won’t be using this method.

I added a ‘maxQueue’ control that will ensure the runspacepool is not filled up immediately.  It is defined as ($throttle * 2) + 1, to help ensure the runspacepool always has at least ($throttle + 1) runspaces queued up and ready to run.  This can be changed depending on whether you prefer an accurate timeout or performance.

About these ads

One thought on “Parallel PowerShell: Part II

  1. Dude, i`m very impressed !
    I’d used this script today against 2400+ servers for query a registry value and it was run amazingely fast (with throttle set to 100 and without timeout…I need to check the timeout thing. thank you very much!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s