*Edit: Updated code is available in this post
There are times where running parallel tasks in PowerShell would be handy. Unfortunately, while Workflows can run foreach –parallel, there’s no built in function to handle this in PowerShell. I came across a few tasks where parallelization would save time, and figured it was time to dive into the subject. I ended up with three functions that heavily borrow from the work of Tome Tanasovski, Boe Prox, and Ryan Witschger.
Before we go into these, I would highly recommend you watch Dr. Tobias Weltner’s webcast on speeding up PowerShell. I found it quite insightful on a variety of topics – certainly well worth the quick registration process. There are other helpful resources on parallelization available as well, including these from Jon Boulineau, Will, Joe Pruitt, and Chris O’Prey.
Do keep in mind that several of the authors above warn that you should use them with caution, and that I am not nearly as familiar with the language as these authors : ) That being said, the modifications I made were minimal; for example, adding a non-functional progress bar, adding AD queries, and other non-runspace changes.
What can Runspaces do for me?
A while back I built a query that runs through AD and queries every single computer account, collecting a variety of information and running certain diagnostics. It is resource light and a helpful backup to existing monitoring solutions. Unfortunately, it takes a very long time, as a good number of systems will not respond; those three seconds add up quickly. Running things in parallel cut the entire query from a day or two to 5 hours.
Running more simple tasks like a single test-connection or a couple WMI queries would see an even greater benefit from parallelization. If you check out Boe’s post on this, you will see that for a simple WMI query upon successful test-connection, a 200 system query takes 22 seconds using runspace pools with a throttle of 10. The next fastest method was the standard synchronous foreach at 158 seconds. If you manage many systems, you can see that using runspace pools will be invaluable.
Here’s another real world example. I pulled the first 100 computer accounts from AD and simply ran test-connection against them.
Many of these computers no longer exist, but running queries against computers that might not respond is a likely scenario that will delay your commands. In this case, the performance difference went from 54 seconds using a runspace pool with 10 threads, to 8 minutes and 42 seconds otherwise.
How do I get started?
Go back and make sure you review the posts above that go into actual technical detail. Then download the code from here. Add the functions to your profile or a custom module, paste them into your session, or dot source them. Once the functions are available, run get-help –full against them for more details! The functions are Foreach-Parallel, Run-Parallel, and Run-ParallelJobs.
The Run-Parallel and Run-ParallelJobs functions each include parameters to query AD. These query the default domain. To use the QuerySvrAD and QueryWksAD to query server and workstation computer accounts, you must modify the scripts with the appropriate distinguished names of the OU or containers where these accounts could be found. Ctrl+F and look for ExampleDC to find the lines where this can be set.
I plan to expand on these; for example, progress tacking could be improved, adding parameters to allow more than one argument passed to the parallelized script(block) by stacking AddArgument(), and adding timeout values or resource constraints on runspaces.
If you run into any trouble or have any suggestions, please let me know!
Other stuff
Google posted a bit on their datacenters and invited someone from Wired to visit and write a story on it. Wouldn’t it be nice!