In brief: this post describes a method to use an alternative task queue system to speed up the background operations performed by the Magento extension Boost My Shop ERP. With a little effort, you should be able to apply this methodology and toolset to tough problems of your own.
We use an extension called Boost My Shop ERP to manage our LiveOutThere.com warehouse. It surprises me just how much value Olivier Zimmerman and his team have packed into a €500 extension, and it will get us through million dollar months to come.
But, like most things in Magento land, we need to extend a few things to make Boost My Shop’s ERP work exactly the way we needed it to. Our topic today: how to refresh big catalogs using gearman. The reason we do this is because we receive new sets of EDI “available-to-sell” documents from our suppliers every night, and we have to adjust the backorders status of thousands upon thousands of products and update all the various indexes, caches, aggregates, and statuses that are stored by Magento and made even more complicated by ERP.
Boost My Shop ERP does have a built-in cron scheduler extension that accepts events and queues them for later execution; but we found it inconsistent and difficult to troubleshoot. Our catalog size is >200,000 SKUs and the out-of-the-box cron jobs were stalling and failing. This caused outdated information on the site, like products that are in-stock but showing up as out-of stock and vice versa.
We needed a way to refresh stock status an order of magnitude faster and with more control than we had been able to do previously. Enter Gearman!
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
To use Gearman you will need to do a few things:
Install the IBuildings Gearman extension for Magento from https://github.com/frak/Magento-Gearman-Module and follow the instructions to install the required libraries and classes
On my Mac with the Homebrew package manager, it’s this simple: `brew install gearman`
Don’t forget to add this to your php.info and make sure the extension has been loaded;
Install supervisord (try this guide from python.org)
Copy the example configuration file for supervisord to $CWD/supervisord.conf or /etc/supervisord.conf
Now you’ll add a program directive to the bottom of your supervisord.conf. Mine looks like this:
Then run supervisorctl from the /bin directoy of wherever you installed supervisor to. You will see a list of all your “workers” and their state. What are workers? They are the PHP files you specified in the command config value above. Supervisord keeps them open and “waiting for work” from a dispatcher script.
At this point gearman.php, the task dispatcher, and gearman_worker.php don’t exist yet, so create a gearman.php file in the shell directory of your Magento root:
Then, create a gearman_worker.php script in the /shell directly as well. You’ll notice this file was referenced in the supervisord.conf program configuration. Supervisor is going to spawn multiple instances of this worker:
You’re really close! Good slugging.
This part is really important! If you make a change to your gearman_worker.php file you will need to use supervisorctl to restart all of your daemonized workers. That means typing supervisorctl at the command line (you might need to specifiy the location to your config file with the -c option), and then typing `restart all`. If you want to change the number of workers you have configured, you will need to type `shutdown` in the supervisorctl console, exit, and then start a new instance of supervisord itself.
As you might have noticed in the refreshStockStatus_fn function in the worker above, there is a call to a standard Magento helper. Here’s where the real work happens. Just one more Gist:
This method recalculates the current quantity on hand based on all the stock movements in our ERP system, refreshes the availability status cache entry for a product, and resets expected delivery dates and quantities. Remember, it is being called concurrently by many workers, so we are refreshing many, many products at once rather than one product at a time!
Perhaps we need just one more gist to show the alternative:
Will the code above work? Of course it will! It’s certainly a lot easier to understand. But try running it through a recordset with hundreds of thousands of products in it. It’s a non-starter.
After completing this gearman implementation catalog refresh stock status time decreased from 12 hours if it ever finished at all, to one hour.
There are many other benefits like being able to “fake it til you make it” re: performance issues – for instance, Ticketmaster and airlines’ websites “searching for tickets” spinners – these guys figured it out ages ago! Just give me something to look at for christs’ sake. Not to mention the opportunities for graceful handling of large batches of work like image processing, user uploads, saving large configuration changes, or communicating with slow external APIs.