This guide shows how to produce hung_task_timeout_secs and blocked for to a greater extent than than 120 seconds problem inwards Linux.
A panic may lead identify every bit a effect of a hardware failure or a software põrnikas inwards the operating system. 
Background
My server became unresponsive today (around 15:38hrs)
I’ve collected next logs that shows Memory together with CPU usage together with narrowed downward /var/log/messages.
After doing a difficult reboot, it came dorsum online precisely I was unable to access it via VNC or SSH.
VNC connectedness showed an mistake (many errors precisely all contained /proc/sys/kernel/hung_task_timeout_secs“)
INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
Not tainted 2.6.32-431.11.2.el6.x86_64 #1
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Step past times measurement troubleshooting information and logs
Check Memory usage
Following log shows server retentivity usage
someuser@servercore [/var/log]# sar -r
15:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
15:20:01 476604 1396772 74.56 110140 707116 1201652 30.64
15:30:02 526240 1347136 71.91 110412 710536 1165148 29.71
15:55:53 LINUX RESTART
16:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
16:10:01 517168 1356208 72.39 136040 588964 1196724 30.52
16:20:01 510580 1362796 72.75 137488 596560 1191664 30.39
As you lot tin give notice see, it’s non that high together with I had plenty of complimentary Memory.
Check CPU usage
Following log shows CPU usage.
someuser@servercore [/var/log]# sar -u
15:00:01 CPU %user %nice %system %iowait %steal %idle
15:20:01 all 6.01 0.04 1.74 1.59 0.14 90.48
15:30:02 all 4.97 0.04 1.54 7.87 0.15 85.44
Average: all 7.20 0.06 2.19 2.69 0.26 87.60
15:55:53 LINUX RESTART
16:00:01 CPU %user %nice %system %iowait %steal %idle
16:10:01 all 9.13 0.04 2.78 6.98 0.31 80.76
16:20:01 all 4.21 0.04 1.39 3.49 0.15 90.73
Again, CPU wasn’t at 100%. This is directly getting annoying that I can’t explicate why I am getting into s**tstorm for nothing.
Let’s banking corporation tally //var/log/messages to notice all the mistake logs related this this heart together with soul panic
Check Kernel Panic Logs
Now I am getting somewhere …
someuser@servercore [/var/log]# grep 'Aug 22 15' messages | grep -v Firewall | grep -v blackmore | grep -v operational | grep -v ec2
Aug 22 15:38:05 servercore kernel: INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:05 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:05 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:05 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 two 0x00000000
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work rs:main Q:Reg:1035 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work queueprocd - qu:1793 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: queueprocd - D 0000000000000000 0 1793 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work httpd:30439 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:07 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:07 servercore kernel: httpd D 0000000000000000 0 30439 2223 0x00000080
Aug 22 15:38:07 servercore kernel: Call Trace:
Aug 22 15:38:11 servercore kernel: INFO: delineate of piece of work httpd:30482 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:11 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:11 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:11 servercore kernel: httpd D 0000000000000000 0 30482 2223 0x00000080
Aug 22 15:38:11 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 two 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: delineate of piece of work flush-253:0:263 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: flush-253:0 D 0000000000000000 0 263 two 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:56 servercore kernel: INFO: delineate of piece of work rs:main Q:Reg:1035 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:56 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:56 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:56 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:39:56 servercore kernel: Call Trace:
Aug 22 15:42:11 servercore kernel: Clocksource tsc unstable (delta = -8589964877 ns)
15:55:53 LINUX RESTART
As you lot tin give notice encounter all the errors contained “echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.” together with “blocked for to a greater extent than than 120 seconds” somewhere.
Now let’s produce this work 1 time together with for all..
Solution for hung_task_timeout_secs
Explanation
By default Linux uses upward to 40% of the available retentivity for file arrangement caching. After this score has been reached the file arrangement flushes all outstanding information to disk causing all next IOs going synchronous. For flushing out this information to disk this in that place is a fourth dimension trammel of 120 seconds past times default. In the illustration hither the IO subsystem is non fast plenty to even out the information withing 120 seconds. As IO subsystem responds like shooting fish in a barrel together with to a greater extent than requests are served, System Memory gets filled upward resulting inwards the higher upward error, thence serving HTTP requests.
Testing
I tested this theory alongside the following:
Change vm.dirty_ratio together with vm.dirty_backgroud_ratio
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_ratio=10
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_background_ratio=5
Commit Change
someuser@servercore [/home/someuser]# sudo sysctl -p
Make it permanent
When the server seemed to a greater extent than stable together with no Kernel/Swap/Memory Panic for a week, I edited /etc/sysctl.conf file to brand these permanent later on reboot.
someuser@servercore [/home/someuser]$ sudo vi /etc/sysctl.conf
ADD two lines at the bottom
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
Save together with exit.
someuser@servercore [/home/someuser]$ sudo reboot
That’s it. I never had this number .. always again..
Hope soul notice this information useful.
