How to check why a Linux process is hanging

In this article, let me explain how you can check why a Linux process is hanging. In most of the occasions, We, the system admins 🙂 usually restart an application if it is hanging. We are forced to do this when we are firefighting and this could backfire on us later. So finding the root cause is the best thing we can do. It is possible to find root cause in most of the cases. Let’s see how.

First of all, You need to find the PID of the process. You can do it with  ps  command.

# ps -aux | grep process_name | awk '{print $2}'
32287

Once you get the PID, you need to run  strace  on it. strace command gives you info on all system calls made by the process.

strace -p PID

For example,

# strace -p 32287
Process 32287 attached - interrupt to quit
recvfrom(34,

If your process is really hanging, you might see  strace  result also hanging on a system call. In our example, the process is waiting for a system call  recvfrom  to complete. System call recvfrom is used to receive message from a socket. The strace result also has information on FD number of the socket. We gonna use the FD number(In the example, it is 34) to trace out the particular connection which is hanging with help of  lsof  command.

lsof -p PID | grep FD

For example,

#lsof -p 32287 | grep 34
test_process 32287 root 34u IPv4 1497703330 0t0 TCP test1.box.net:47879->test2.box.net:http (ESTABLISHED)

Now you don’t need to shoot in the dark.  We have a clear idea on which connection it is hanging. Now you can go the developer and tell him to fix the code 😎 Just kidding, Fixing the code would not be the case all the time.

 

Thanks for the time taken to read my blog. Subscribe to this blog so that you don’t miss out anything useful   (Checkout Right Sidebar for Facebook follow button and mail subscription form )  . Please also put your thoughts as comments .

Related Posts

Begin typing your search term above and press enter to search. Press ESC to cancel.

Back To Top
x