We use OTRS as our helpdesk system to allow our concierges to help our customers solve their problems. Specifics of our type of support pushes us to use processes and various integrations with many other systems, so OTRS is what we needed because of its functionality.
This article is about the 4th version of OTRS, as 5th is still in beta and using it in production, probably, is not the best idea. But all these changes should work fine in the 5th version of OTRS also.
When we started to look into this direction, the only one thing we didn’t like was mod_perl, which meant we needed to use Apache. Apache is a great web server, but we prefer to use Nginx in our environment and, frankly speaking, we don’t like mod_perl :).
We started to look into people’s experience with a different configuration based on using FastCGI. Google didn’t show many results on this. The main article we looked at was called “OTRS in Nginx using FCGI (multiple users possible)”. So, our first configuration was made using that article, but we started to have some issues with logging: we wanted to catch stderr in separate log files rather than the Nginx’ error log, which wasn’t easy. Also, it doesn’t give an idea how to run API (we use SOAP version) under Nginx, as OTRS has the NPH script, which works fine in CGI mode, and maybe it works fine in FastCGI mode in Apache, but not on Nginx. However, let’s forget about API for now. We used that article, ran the system, tested, released it to production.
For failover, we use two web servers and PostgreSQL as the database. To share data between machines we use GlusterFS for a few directories that OTRS requires to be shared.
Once we gave it to our concierges, we understood that if you go to “My Queues” as an agent and a total amount of open tickets is more than 20, it doesn’t work fast. Checking for DB and other potential sources of slowdowns showed that the problem was hidden somewhere else. We did “strace” of OTRS process, and it revealed that it was spending time on accessing the temporary directory where OTRS stores its cache via the FileStorable module. Reading is being done one by one. So, on GlusterFS we get additional delays.
Using Memcached or Redis is probably a better solution here. Documentation of OTRS says that Memcache module is available in OTRS Business solution. Googling gave results only for a solution that Paweł Bogusławski made for 3rd version. We looked into the current version of Cache module of the 4th version and changed it to how it should be in the version we use (https://github.com/dnikolayev/otrs/blob/memcache/Kernel/System/Cache/Memcached.pm).
Unfortunately, we got no success in enabling it quickly through Framework.xml (Params: list of Memcached servers and its conf) like Paweł did, but Kernel::Config solved all our problems. So if you run our version of Memcached module for OTRS, please add this to your Config.pm:
$Self->{Memcached} = {'Servers' => ['127.0.0.1:11211'], 'Parameters' => {'compress_threshold' => 10_000, 'utf8'=>1}};
We use twemproxy locally on each web server, and it has a proper configuration of our Memcached cluster. So each OTRS connects to Memcached locally. Also, many thanks to Twitter for such great utility! :)`
So, once you did all of this it works fast, but then you start to see that size of processes is getting bigger on load. OTRS has some memory leaks probably, so even our Zabbix health checker makes processes sizes much bigger in few hours :). If you worked with FastCGI before – you know how to fix this without going deep. You need to limit an amount of requests that your FastCGI application processes and then do a graceful shutdown of it, the new and clean process will be run by multiwatch in our configuration or any manager if you use other configuration. Please check how we changed it in our version of OTRS scripts on Git Repository.
Also, settings configuration is the other thing you need to know. Most of the settings are saved into .pm files (Perl modules). So, this means, that once you go as an administrator user to SysConfig section of OTRS and save some changes, OTRS changes modules it uses in its work. According to the docs of OTRS, you need to watch for this one in Kernel/Config/Files directory:
Don’t forget that if you use multi-webs configuration, this directory should be on shared storage also.
In mod_perl, OTRS does reload of its modules on saving. But if you went by our way with FastCGI – modules are still in memory 🙂 So you need to restart all processes gracefully (wait for current users to finish their calls). In the proposed code, you can see the signal it accepts to do graceful shutdown and multiwatch will start new processes that will use updated settings modules.
To get all benefits of running it as real FastCGI application, but not as fork or system call via some wrapper, we put everything described together and ran index.pl by this way:
/usr/bin/spawn-fcgi -s /tmp/otrs-fastcgi.sock -M 0660 -n -- /usr/bin/multiwatch -f 10 -r 30 -t 2000 /opt/otrs/bin/fcgi-bin/index.pl
Please check log directory configuration in the scripts to write your logs in proper place. In NPH script we redefined STDIN, STDOUT, and STDERR to make it working correctly, so the main cycle looks some different versus customer.pl and index.pl,
To control all these applications and run our daemon to watch on changing in configuration files of OTRS we use supervisor. Examples of its configuration is also available on Github. Just use them and change according to your configuration.
So, as a result, you will get working OTRS on Nginx with Supervisor that controls spawn-fcgi and multiwatch that runs processes of OTRS (customer, index and API ones).
This configuration works perfectly on high load we have and doesn’t bother much 🙂
P.s. Once ticket amount increases much, we’re looking to use OTRS::SphinxSearch to speed up the ticket search for agents when it is needed.