Skip to main content

Sysadmin fails: When service dependencies go wrong

Having problems with services failing to start because other services are taking too long? Here's one person's solution.
"Broken Computers" by Paul Downey is licensed under CC BY 2.0

High-performance computing (HPC) needs a parallel filesystem to mount storage over all nodes, and it needs to mount this storage using the InfiniBand (IB) network standard. Storage organization adds the mount command for the Lustre filesystem in /etc/rc.local to mount at boot time.

When I was building an HPC cluster, I needed to mount this storage over all of my master and compute nodes. But, when I was rebooting any node, it could not mount the Lustre filesystem. Ultimately, I wrote a script to solve this problem.

The Crux

There are many scenarios in which there are dependencies for services, or for the network to start any service. You can face this kind of issue when you’re working with multiple original equipment manufacturers (OEMs) simultaneously, where an application from one OEM is dependent on another OEM. These dependencies can make it hard to coordinate with all of the OEMs to resolve the issue.

In my scenario, I was facing this problem. I was neither getting any support from the storage organization, nor from my InfiniBand service provider. In these kinds of scenarios, system administrators in a system integrator organization need to resolve the issue and run all services perfectly.

Basically, the storage OEMs add the mount command in /etc/rc.local to mount the network filesystem on boot. When the command executes, there is a condition that the IB network should be active, because we’re mounting it from storage using the InfiniBand network.

In my case, the InfiniBand network was taking too much time to activate, and before that happened Lustre was trying to mount the filesystem.

Creativity > Problem

To avoid this issue, I wrote a script and placed it in my /etc/rc.local file to make my filesystem mount automatically at boot. I saved this script in a shared directory (in an HPC cluster, /home and /lscratch are NFS shared among all nodes) and then executed it from /etc/rc.local or a cron job:

[root@master ~]# cat /lscratch/lustre-script.sh

#!/bin/bash

h=$(ibstat | grep State | awk '{print $2}')

for i in {1..150}

do
#To Check Current Status of InfiniBand.
echo "Current Status of InfiniBand $h"

if [[ ( "$h" == "Active" ) ]]
    then
          echo "IB is UP, mounting lustre File System."
          /bin/mount -o flock,defaults -t lustre 192.168.43.20@o2ib,192.168.43.23@o2ib1:192.168.43.26@o2ib,192.168.43.28@o2ib1:/lustreshare /storage   
exit

else
echo "IB is down at checking $i"
    echo "Retrying.."
    sleep 2
   	fi
done

mail -s 'Unable to mount Lustre file System on $(hostname)' admin@mydomain.com <<< 'InfiniBand is not active, therefore didn’t try to mount the Lustre File System. Need your attention'
[root@master ~]# chmod +x /lscratch/lustre-script.sh

Here, I’ve saved a script under the shared directory /lscratch and changed the permissions to execute. This script breaks down as follows:

  1. At the beginning, I set a variable that checks the status of the InfiniBand network, filters State from the command’s output, and prints the second column. This setup looks for the word Active, which tells us that the InfiniBand network is up.
  2. On the third line, I used a for loop to test the InfiniBand network’s status 150 times.
  3. On the sixth line, the script prints the current result of the InfiniBand network’s state.
  4. On the seventh line, I used an if condition to check the current state of the InfiniBand network, looking for the word Active. If the condition on the seventh line is true, the script proceeds to the ninth line and prints the sentence: IB is UP, mounting lustre File System.
  5. After printing the sentence, the script tries to mount the Lustre filesystem and then exits. In the case of the condition failing, the script prints two sentences, sleeps for two seconds, and then starts the new execution from line six.

Using these steps, this script checks the InfiniBand state up to 150 times, therefore running for up to 300 seconds. If the script finds Active, then it mounts the Lustre filesystem and exits. I’ve also added one last line in the script. It sends an email to a specified email address if the InfiniBand network does not become active in the first 300 seconds.

Now, you can execute this script from /etc/rc.local. I’ve added the following entry in /etc/rc.local to create a log of script execution:

[root@master ~]#  tail -n 1 /etc/rc.local
sh /lscratch/lustre-script.sh >> /lscratch/lustre-script.log

These logs can help you monitor the script.

We can also add a cron job for our script. Here is an example that I’ve added in my setup. It runs after each reboot:

[root@master ~]# crontab –l
@reboot sh /lscratch/lustre-script.sh >> /lscratch/lustre-script.log

Using this script, I can have my Lustre mount command wait until the InfiniBand network becomes Active. This is the basic script I wrote to help my Lustre mount command. Using this script’s syntax, you can create your own script to wait for any independent service to start before any other dependent one.

Topics:   Sysadmin culture  
Author’s photo

Amit Waghmare

I'm a techie guy with lots of love for Linux. I've started my career with a US-based project as Linux Administrator. Later, I got an opportunity to work with HPC clusters, where I learned several other products. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.