Linux Essence Panic Issue: How To Laid Upwardly Hung_Task_Timeout_Secs As Well As Blocked For To A Greater Extent Than Than 120 Seconds Problem

This guide shows how to produce hung_task_timeout_secs and blocked for to a greater extent than than 120 seconds problem inwards Linux.

A panic may lead identify every bit a effect of a hardware failure or a software põrnikas inwards the operating system. In many cases, the operating arrangement is capable of continued functioning later on an mistake has occurred. However, the arrangement is inwards an unstable solid soil together with rather than risking safety breaches together with information corruption, the operating arrangement stops to forestall farther harm together with facilitate diagnosis of the mistake and, inwards green cases, restart. After recompiling a heart together with soul binary icon from source code, a heart together with soul panic during booting the resulting heart together with soul is a mutual work if the heart together with soul was non correctly configured, compiled or installed. Add-on hardware or malfunctioning RAM could likewise live sources of fatal heart together with soul errors during kickoff up, due to incompatibility alongside the OS or a missing device driver. H5N1 heart together with soul may likewise become into panic() if it is unable to locate a rootage file system. During the lastly stages of heart together with soul userspace initialization, a panic is typically triggered if the spawning of init fails, every bit the arrangement would together with so live unusable.

Background

My server became unresponsive today (around 15:38hrs)

I’ve collected next logs that shows Memory together with CPU usage together with narrowed downward /var/log/messages.

After doing a difficult reboot, it came dorsum online precisely I was unable to access it via VNC or SSH.

VNC connectedness showed an mistake (many errors precisely all contained /proc/sys/kernel/hung_task_timeout_secs“)

INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
 Not tainted 2.6.32-431.11.2.el6.x86_64 #1
 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Step past times measurement troubleshooting information and logs

Check Memory usage

Following log shows server retentivity usage

someuser@servercore [/var/log]# sar -r

15:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
15:20:01 476604 1396772 74.56 110140 707116 1201652 30.64
15:30:02 526240 1347136 71.91 110412 710536 1165148 29.71

15:55:53 LINUX RESTART

16:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit
16:10:01 517168 1356208 72.39 136040 588964 1196724 30.52
16:20:01 510580 1362796 72.75 137488 596560 1191664 30.39

As you lot tin give notice see, it’s non that high together with I had plenty of complimentary Memory.

Check CPU usage

Following log shows CPU usage.

someuser@servercore [/var/log]# sar -u
15:00:01 CPU %user %nice %system %iowait %steal %idle
15:20:01 all 6.01 0.04 1.74 1.59 0.14 90.48
15:30:02 all 4.97 0.04 1.54 7.87 0.15 85.44
Average: all 7.20 0.06 2.19 2.69 0.26 87.60

15:55:53 LINUX RESTART

16:00:01 CPU %user %nice %system %iowait %steal %idle
16:10:01 all 9.13 0.04 2.78 6.98 0.31 80.76
16:20:01 all 4.21 0.04 1.39 3.49 0.15 90.73

Again, CPU wasn’t at 100%. This is directly getting annoying that I can’t explicate why I am getting into s**tstorm for nothing.

Let’s banking corporation tally //var/log/messages to notice all the mistake logs related this this heart together with soul panic

Check Kernel Panic Logs

Now I am getting somewhere …

someuser@servercore [/var/log]# grep 'Aug 22 15' messages | grep -v Firewall | grep -v blackmore | grep -v operational | grep -v ec2
Aug 22 15:38:05 servercore kernel: INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:05 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:05 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:05 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 two 0x00000000
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work rs:main Q:Reg:1035 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work queueprocd - qu:1793 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: queueprocd - D 0000000000000000 0 1793 1 0x00000080
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:06 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:06 servercore kernel: Call Trace:
Aug 22 15:38:06 servercore kernel: INFO: delineate of piece of work httpd:30439 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:06 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:07 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:07 servercore kernel: httpd D 0000000000000000 0 30439 2223 0x00000080
Aug 22 15:38:07 servercore kernel: Call Trace:
Aug 22 15:38:11 servercore kernel: INFO: delineate of piece of work httpd:30482 blocked for to a greater extent than than 120 seconds.
Aug 22 15:38:11 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:38:11 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:38:11 servercore kernel: httpd D 0000000000000000 0 30482 2223 0x00000080
Aug 22 15:38:11 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: delineate of piece of work jbd2/vda3-8:250 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: jbd2/vda3-8 D 0000000000000000 0 250 two 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:54 servercore kernel: INFO: delineate of piece of work flush-253:0:263 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:54 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:54 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:54 servercore kernel: flush-253:0 D 0000000000000000 0 263 two 0x00000000
Aug 22 15:39:54 servercore kernel: Call Trace:
Aug 22 15:39:56 servercore kernel: INFO: delineate of piece of work rs:main Q:Reg:1035 blocked for to a greater extent than than 120 seconds.
Aug 22 15:39:56 servercore kernel: Not tainted 2.6.32-431.11.2.el6.x86_64 #1
Aug 22 15:39:56 servercore kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 22 15:39:56 servercore kernel: rs:main Q:Reg D 0000000000000000 0 1035 1 0x00000080
Aug 22 15:39:56 servercore kernel: Call Trace:
Aug 22 15:42:11 servercore kernel: Clocksource tsc unstable (delta = -8589964877 ns)

15:55:53 LINUX RESTART

As you lot tin give notice encounter all the errors contained “echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.” together with “blocked for to a greater extent than than 120 seconds” somewhere.

Now let’s produce this work 1 time together with for all..

Solution for hung_task_timeout_secs

Explanation

By default Linux uses upward to 40% of the available retentivity for file arrangement caching. After this score has been reached the file arrangement flushes all outstanding information to disk causing all next IOs going synchronous. For flushing out this information to disk this in that place is a fourth dimension trammel of 120 seconds past times default. In the illustration hither the IO subsystem is non fast plenty to even out the information withing 120 seconds. As IO subsystem responds like shooting fish in a barrel together with to a greater extent than requests are served, System Memory gets filled upward resulting inwards the higher upward error, thence serving HTTP requests.

Testing

I tested this theory alongside the following:

Change vm.dirty_ratio together with vm.dirty_backgroud_ratio

someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_ratio=10
someuser@servercore [/home/someuser]$ sudo sysctl -w vm.dirty_background_ratio=5

Commit Change

someuser@servercore [/home/someuser]# sudo sysctl -p

Make it permanent

When the server seemed to a greater extent than stable together with no Kernel/Swap/Memory Panic for a week, I edited /etc/sysctl.conf file to brand these permanent later on reboot.

someuser@servercore [/home/someuser]$ sudo vi /etc/sysctl.conf

ADD two lines at the bottom

vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

Save together with exit.

someuser@servercore [/home/someuser]$ sudo reboot

That’s it. I never had this number .. always again..

Hope soul notice this information useful.

Reference

Ronny Egners Blog – INFO: delineate of piece of work blocked for to a greater extent than than 120 seconds.

Buat lebih berguna, kongsi: