In this article, let me explain how you can check why a Linux process is hanging. In most of the occasions, We, the system admins 🙂 usually restart an application if it is hanging. We are forced to do this when we are firefighting and this could backfire on us later. So finding the root cause is the best thing we can do. It is possible to find root cause in most of the cases. Let’s see how.
First of all, You need to find the PID of the process. You can do it with ps command.
# ps -aux | grep process_name | awk '{print $2}' 32287
Once you get the PID, you need to run strace on it. strace command gives you info on all system calls made by the process.
strace -p PID
For example,
# strace -p 32287 Process 32287 attached - interrupt to quit recvfrom(34,
If your process is really hanging, you might see strace result also hanging on a system call. In our example, the process is waiting for a system call recvfrom to complete. System call recvfrom is used to receive message from a socket. The strace result also has information on FD number of the socket. We gonna use the FD number(In the example, it is 34) to trace out the particular connection which is hanging with help of lsof command.
lsof -p PID | grep FD
For example,
#lsof -p 32287 | grep 34 test_process 32287 root 34u IPv4 1497703330 0t0 TCP test1.box.net:47879->test2.box.net:http (ESTABLISHED)
Now you don’t need to shoot in the dark. We have a clear idea on which connection it is hanging. Now you can go the developer and tell him to fix the code 😎 Just kidding, Fixing the code would not be the case all the time.
Thanks for the time taken to read my blog. Subscribe to this blog so that you don’t miss out anything useful (Checkout Right Sidebar for Facebook follow button and mail subscription form ) . Please also put your thoughts as comments .