版本为红旗服务器DC 5.0版(又称Asianux 2.0,Asianux release 2.0),内核版本为2.6.9-11.19AXsmp
系统上面运行Oracle数据库,系统不定期死机,已经试过几次。
故障表现为死机的时候,不能telnet,可以ping。
下面是死机前使用自动脚本抓取的数据,死机时间约为下午1点左右
12点21分左右的数据
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 2 2096376 13844 856 717264 0 0 18549 166 1547 4012 5 20 2 73
0 1 2096376 36804 920 694616 45 0 2238 118 1224 3804 2 9 30 59
0 1 2096376 41604 756 690296 0 0 741 106 1115 3916 3 14 41 42
1 0 2096376 45332 608 686284 0 0 168 70 1074 3776 1 11 46 43
4 2 2096376 29708 584 698268 0 0 13518 119 1249 4436 4 28 19 48
1 2 2096376 14412 672 712768 0 0 25990 140 1369 4655 3 20 7 70
1 7 2096376 16716 740 705456 45 0 18592 90 1332 4822 3 16 6 75
0 1 2096376 43836 820 687436 0 0 2978 94 1180 3006 2 13 36 48
1 2 2096376 15564 820 711552 0 0 10266 108 1243 3290 3 20 24 53
2 6 2096376 13900 664 710212 45 0 17434 134 1298 4177 3 15 3 79
1 2 2096376 21324 760 705892 0 0 15685 146 1279 4145 4 19 3 74
1 4 2096376 15948 664 709368 0 0 13274 122 1223 4020 2 12 2 84
2 1 2096376 30860 736 700976 0 0 4119 111 1147 3015 3 11 38 49
2 2 2096376 15580 592 712560 0 0 5917 69 1147 3047 3 17 34 47
0 2 2096376 17284 708 709620 51 1 26287 226 1534 4657 4 26 4 66
2 11 2096376 14084 604 712316 0 0 18076 118 1251 4641 2 16 5 77
4 5 2096376 21460 652 698200 45 0 21050 120 1470 4861 6 24 4 66
1 5 2096376 14164 580 690212 70 3 19418 289 1692 4590 9 54 0 37
2 8 2096376 13772 520 706680 0 0 22174 175 1445 4743 7 31 1 60
4 10 2096376 13828 652 696900 45 0 16583 180 1422 4420 6 30 0 64
0 7 2096376 16580 684 703952 45 0 20326 155 1546 4362 5 39 1 56
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
发现IO等待高
下面是死机前12点41分的数据,发现系统占用了100%的CPU,之后系统挂死。
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
50 39 2096376 14716 296 667124 12 0 14220 56 3336 8468 0 100 0 0
46 34 2096376 13948 308 668412 0 0 6383 13 2562 6113 0 100 0 0
47 26 2096376 14140 304 667896 0 0 6077 25 1605 3763 0 100 0 0
39 44 2096376 14396 288 668172 0 0 7291 22 1867 3881 0 100 0 0
50 39 2096376 14204 304 667896 0 0 5273 25 1442 3480 0 100 0 0
56 31 2096376 14076 308 667892 0 0 5129 14 1452 3318 0 100 0 0
56 25 2096376 14140 276 668184 0 0 5000 14 1354 3445 0 100 0 0
49 32 2096376 14140 276 668444 0 0 3952 4 1175 3082 0 100 0 0
60 26 2096376 13884 336 668384 0 0 6470 52 1646 2971 0 100 0 0
48 35 2096376 14332 300 667380 0 0 3469 21 1216 3019 0 100 0 0
57 33 2096376 14012 300 667900 0 0 7464 32 1993 4528 0 100 0 0
52 20 2096376 14716 308 667372 19 0 8706 20 2673 6086 0 100 0 0
62 18 2096376 14012 304 667896 0 0 4009 16 1170 2944 0 100 0 0
61 20 2096376 14460 320 667100 0 0 4906 25 1354 3388 0 100 0 0
55 36 2096376 13756 320 667880 0 0 14782 10 4066 9627 0 100 0 0
60 35 2096376 13820 300 668156 13 0 5979 4 2271 5400 0 100 0 0
70 24 2096376 14460 296 667644 6 0 6330 25 1572 3531 0 100 0 0
65 36 2096376 14780 288 667132 6 0 6655 26 1801 4232 0 100 0 0
61 32 2096376 14588 304 667376 0 0 4893 18 1294 3093 0 100 0 0
57 33 2096376 13820 304 668416 0 0 6950 14 2102 5183 0 98 0 1
69 19 2096376 14396 296 667644 0 0 4502 6 1262 2885 0 100 0 0
下面是死机前抓到的进程信息
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 3524 216 ? S Apr30 4:53 init [3]
root 2 0.0 0.0 0 0 ? S Apr30 0:48 [migration/0]
root 3 0.0 0.0 0 0 ? SN Apr30 0:01 [ksoftirqd/0]
root 4 0.0 0.0 0 0 ? S Apr30 1:23 [migration/1]
root 5 0.0 0.0 0 0 ? SN Apr30 0:01 [ksoftirqd/1]
root 6 0.0 0.0 0 0 ? S< Apr30 0:00 [events/0]
root 7 0.0 0.0 0 0 ? S< Apr30 0:00 [events/1]
root 8 0.0 0.0 0 0 ? S< Apr30 0:00 [khelper]
root 9 0.0 0.0 0 0 ? S< Apr30 0:00 [kacpid]
root 100 0.0 0.0 0 0 ? S< Apr30 0:00 [kblockd/0]
root 101 0.0 0.0 0 0 ? S< Apr30 0:00 [kblockd/1]
root 102 0.0 0.0 0 0 ? S Apr30 0:00 [khubd]
root 112 0.0 0.0 0 0 ? S Apr30 1:49 [pdflush]
root 114 0.0 0.0 0 0 ? S< Apr30 0:00 [aio/0]
root 115 0.0 0.0 0 0 ? S< Apr30 0:00 [aio/1]
root 113 1.7 0.0 0 0 ? R Apr30 912:08 [kswapd0]
root 700 0.0 0.0 0 0 ? S Apr30 0:00 [kseriod]
root 771 0.0 0.0 0 0 ? S Apr30 0:00 [scsi_eh_0]
root 809 0.0 0.0 0 0 ? S Apr30 0:00 [scsi_eh_1]
root 819 0.0 0.0 0 0 ? S< Apr30 0:00 [kmirrord/0]
root 820 0.0 0.0 0 0 ? S< Apr30 0:00 [kmirrord/1]
root 826 0.0 0.0 0 0 ? S Apr30 5:45 [md4_raid1]
root 829 0.0 0.0 0 0 ? S Apr30 0:00 [md3_raid1]
root 831 0.0 0.0 0 0 ? S Apr30 2:13 [md2_raid1]
root 833 0.0 0.0 0 0 ? S Apr30 0:27 [md1_raid1]
root 834 0.0 0.0 0 0 ? S Apr30 0:00 [md0_raid1]
root 846 0.0 0.0 0 0 ? S Apr30 5:43 [kjournald]
root 3175 0.0 0.0 3404 136 ? S<s Apr30 0:00 udevd
root 3892 0.0 0.0 0 0 ? S Apr30 1:16 [kjournald]
root 3893 0.0 0.0 0 0 ? S Apr30 0:00 [kjournald]
root 3894 0.0 0.0 0 0 ? S Apr30 0:00 [kjournald]
root 3895 0.0 0.0 0 0 ? S Apr30 2:48 [kjournald]
root 3896 0.0 0.0 0 0 ? S Apr30 2:40 [kjournald]
root 3897 0.0 0.0 0 0 ? S Apr30 2:27 [kjournald]
root 3898 0.0 0.0 0 0 ? S Apr30 1:14 [kjournald]
root 3899 0.0 0.0 0 0 ? S Apr30 2:27 [kjournald]
root 3900 0.0 0.0 0 0 ? S Apr30 0:00 [kjournald]
root 4220 0.0 0.0 1716 252 ? Ss Apr30 1:46 cpuspeed -d -i 2
root 4397 0.0 0.0 1624 316 ? Ss Apr30 0:31 syslogd -m 0
root 4401 0.0 0.0 2760 208 ? Ss Apr30 0:00 klogd -x
root 4411 0.0 0.0 2760 264 ? Ss Apr30 0:58 irqbalance
root 4421 0.0 0.0 2084 216 ? Ss Apr30 0:10 mdadm --monitor --scan -f
root 4456 0.0 0.0 2980 168 ? Ss Apr30 0:00 /usr/sbin/acpid
root 4512 0.0 0.0 5556 276 ? Ss Apr30 0:02 /usr/sbin/sshd
root 4543 0.0 0.0 5152 276 ? Ss Apr30 0:06 crond
xfs 4576 0.0 0.0 3956 216 ? Ss Apr30 0:00 xfs -droppriv -daemon
daemon 4593 0.0 0.0 1832 224 ? Ss Apr30 0:01 /usr/sbin/atd
dbus 4602 0.0 0.0 2724 164 ? Ss Apr30 0:00 dbus-daemon-1 --system
root 4640 0.0 0.1 15152 3076 ? Ss Apr30 17:10 hald
root 4692 0.0 0.0 1796 140 tty2 Ss+ Apr30 0:00 /sbin/mingetty tty2
root 4693 0.0 0.0 3312 140 tty3 Ss+ Apr30 0:00 /sbin/mingetty tty3
root 4694 0.0 0.0 1580 140 tty4 Ss+ Apr30 0:00 /sbin/mingetty tty4
root 4695 0.0 0.0 2128 140 tty5 Ss+ Apr30 0:00 /sbin/mingetty tty5
root 4696 0.0 0.0 2396 140 tty6 Ss+ Apr30 0:00 /sbin/mingetty tty6
oracle 5339 0.0 0.0 1070436 1716 ? Ss Apr30 4:17 ora_pmon_DSJ49802
oracle 5341 0.0 0.0 1070864 1820 ? Ss Apr30 2:15 ora_dbw0_DSJ49802
oracle 5343 0.0 0.0 1075468 1016 ? Ss Apr30 10:51 ora_lgwr_DSJ49802
oracle 5345 0.0 0.1 1071436 2528 ? Ss Apr30 7:57 ora_ckpt_DSJ49802
oracle 5347 0.0 0.0 1070064 2048 ? Ss Apr30 0:56 ora_smon_DSJ49802
oracle 5349 0.0 0.0 1069756 660 ? Ss Apr30 0:00 ora_reco_DSJ49802
oracle 5351 0.0 0.0 1069744 1856 ? Ss Apr30 0:18 ora_cjq0_DSJ49802
oracle 5355 0.0 0.0 1070320 428 ? Ss Apr30 0:04 ora_s000_DSJ49802
oracle 5357 0.0 0.0 1070256 432 ? Ss Apr30 0:03 ora_d000_DSJ49802
qgtg 5362 0.0 0.0 14888 1852 ? S Apr30 43:17 /oracle/app/oracle/product/9.2.0/bin/tnslsnr LISTENER -inherit
root 5498 0.0 0.0 2460 228 ? Ss Apr30 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
oracle 5502 0.0 0.2 1070464 4528 ? Ss Apr30 3:54 ora_qmn0_DSJ49802
explat 5544 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
oracle 5547 0.0 0.2 1070652 5912 ? S Apr30 4:44 oracleDSJ49802 (LOCAL=NO)
explat 5548 0.0 0.2 135772 5168 ? S Apr30 0:42 exchangeservice -s
explat 5549 0.0 0.2 135772 5168 ? S Apr30 6:55 exchangeservice -s
explat 5550 0.1 0.2 135772 5168 ? S Apr30 101:01 exchangeservice -s
explat 5551 0.0 0.2 135772 5168 ? S Apr30 1:53 exchangeservice -s
explat 5552 0.0 0.2 135772 5168 ? S Apr30 2:55 exchangeservice -s
explat 5553 0.0 0.2 135772 5168 ? S Apr30 2:58 exchangeservice -s
explat 5554 0.0 0.2 135772 5168 ? S Apr30 2:56 exchangeservice -s
explat 5555 0.0 0.2 135772 5168 ? S Apr30 3:04 exchangeservice -s
explat 5556 0.0 0.2 135772 5168 ? S Apr30 2:58 exchangeservice -s
explat 5557 0.0 0.2 135772 5168 ? S Apr30 2:58 exchangeservice -s
explat 5558 0.0 0.2 135772 5168 ? S Apr30 3:02 exchangeservice -s
explat 5559 0.0 0.2 135772 5168 ? S Apr30 3:01 exchangeservice -s
explat 5560 0.0 0.2 135772 5168 ? S Apr30 3:01 exchangeservice -s
explat 5561 0.0 0.2 135772 5168 ? S Apr30 2:59 exchangeservice -s
explat 5562 0.0 0.2 135772 5168 ? S Apr30 3:00 exchangeservice -s
explat 5563 0.0 0.2 135772 5168 ? S Apr30 2:53 exchangeservice -s
explat 5564 0.0 0.2 135772 5168 ? S Apr30 2:54 exchangeservice -s
explat 5565 0.0 0.2 135772 5168 ? S Apr30 3:02 exchangeservice -s
explat 5566 0.0 0.2 135772 5168 ? S Apr30 2:59 exchangeservice -s
explat 5567 0.0 0.2 135772 5168 ? S Apr30 3:02 exchangeservice -s
explat 5568 0.0 0.2 135772 5168 ? S Apr30 2:56 exchangeservice -s
explat 5569 0.0 0.2 135772 5168 ? S Apr30 2:58 exchangeservice -s
explat 5570 0.0 0.2 135772 5168 ? S Apr30 2:56 exchangeservice -s
explat 5571 0.0 0.2 135772 5168 ? S Apr30 2:49 exchangeservice -s
explat 5572 0.0 0.2 135772 5168 ? S Apr30 2:56 exchangeservice -s
explat 5573 0.0 0.2 135772 5168 ? S Apr30 2:53 exchangeservice -s
explat 5574 0.0 0.2 135772 5168 ? S Apr30 3:02 exchangeservice -s
explat 5575 0.0 0.2 135772 5168 ? S Apr30 2:54 exchangeservice -s
explat 5576 0.0 0.2 135772 5168 ? S Apr30 3:02 exchangeservice -s
explat 5577 0.0 0.2 135772 5168 ? S Apr30 2:51 exchangeservice -s
explat 5578 0.0 0.2 135772 5168 ? S Apr30 2:48 exchangeservice -s
explat 5579 0.0 0.2 135772 5168 ? S Apr30 2:50 exchangeservice -s
explat 5580 0.0 0.2 135772 5168 ? S Apr30 2:57 exchangeservice -s
explat 5581 0.0 0.2 135772 5168 ? S Apr30 2:51 exchangeservice -s
explat 5582 0.0 0.2 135772 5168 ? S Apr30 2:53 exchangeservice -s
explat 5583 0.0 0.2 135772 5168 ? S Apr30 3:01 exchangeservice -s
explat 5584 0.0 0.2 135772 5168 ? S Apr30 1:20 exchangeservice -s
explat 5585 0.3 0.2 135772 5168 ? S Apr30 167:33 exchangeservice -s
explat 5586 0.0 0.2 135772 5168 ? S Apr30 0:11 exchangeservice -s
explat 5587 0.3 0.2 135772 5168 ? S Apr30 185:05 exchangeservice -s
explat 5588 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5589 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5590 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5591 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5592 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5593 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5594 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5595 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5596 0.0 0.2 135772 5168 ? S Apr30 0:54 exchangeservice -s
explat 5597 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5598 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5599 0.0 0.2 135772 5168 ? S Apr30 0:55 exchangeservice -s
explat 5600 0.0 0.2 135772 5168 ? S Apr30 6:56 exchangeservice -s
explat 5601 0.0 0.2 135772 5168 ? S Apr30 1:00 exchangeservice -s
explat 5602 0.0 0.2 135772 5168 ? S Apr30 6:52 exchangeservice -s
explat 5603 0.0 0.2 135772 5168 ? S Apr30 6:51 exchangeservice -s
explat 5604 0.0 0.2 135772 5168 ? S Apr30 1:13 exchangeservice -s
explat 5612 0.2 0.7 40204 15480 ? S Apr30 135:18 timerservice -s
qgtg 5773 0.0 0.3 25528 7260 ? S Apr30 24:39 ictrl --daemon
root 5795 0.0 0.0 3452 192 ? Ss Apr30 0:00 login -- qgtg
qgtg 7568 0.0 0.0 5764 140 tty1 Ss Apr30 0:00 -bash
root 8233 0.0 0.0 6292 112 tty1 S Apr30 0:00 su -
root 8234 0.0 0.0 5884 140 tty1 S Apr30 0:00 -bash
root 18442 0.0 0.0 6144 140 tty1 S+ Apr30 0:00 /bin/sh /usr/X11R6/bin/startx
root 18455 0.0 0.0 3916 152 tty1 S+ Apr30 0:00 xinit /etc/X11/xinit/xinitrc --
root 18456 0.0 0.0 15776 1700 ? S< Apr30 19:09 X :0
root 18532 0.0 0.0 5784 140 tty1 S Apr30 0:00 /bin/sh /usr/bin/startkde
root 18559 0.0 0.0 4936 132 ? Ss Apr30 0:00 /usr/bin/ssh-agent -s
root 18562 0.0 0.0 2668 152 tty1 S Apr30 0:00 /usr/bin/dbus-launch --exit-with-session /etc/X11/xinit/Xclients
root 18563 0.0 0.0 2872 164 ? Ss Apr30 0:00 dbus-daemon-1 --fork --print-pid 8 --print-address 6 --session
root 18592 0.0 0.0 25568 828 ? Ss Apr30 0:00 kdeinit: Running...
root 18596 0.0 0.0 26876 1012 ? S Apr30 0:00 kdeinit: dcopserver --nosid
root 18598 0.0 0.0 28420 1116 ? S Apr30 0:00 kdeinit: klauncher
root 18601 0.0 0.0 29364 1348 ? S Apr30 4:58 kdeinit: kded
root 18605 0.0 0.0 30712 404 ? Ss Apr30 0:00 /usr/lib/scim-1.0/scim-launcher -d -c simple -e all -f socket --no-stay
root 18607 0.0 0.0 19764 244 ? Ss Apr30 9:56 /usr/lib/scim-1.0/scim-launcher -d -c socket -e socket -f x11
root 18619 0.0 0.0 7052 164 ? Ss Apr30 0:00 /usr/lib/scim-1.0/scim-helper-manager
root 18620 0.0 0.0 25284 1144 ? Ssl Apr30 4:06 /usr/lib/scim-1.0/scim-panel-gtk --display :0.0 -c socket -d --no-stay
root 18642 0.0 0.0 11640 992 ? S Apr30 3:39 artsd -F 10 -S 4096 -s 1 -m artsmessage -c drkonqi -l 3 -f
root 18650 0.0 0.1 36044 2788 ? S Apr30 19:28 kdeinit: knotify
root 18652 0.0 0.0 2388 96 tty1 S Apr30 1:07 kwrapper ksmserver
root 18654 0.0 0.0 28324 1216 ? S Apr30 0:00 kdeinit: ksmserver
root 18661 0.0 0.0 30084 1636 ? S Apr30 0:00 kdeinit: kwin
root 18662 0.1 0.0 5904 396 ? S Apr30 53:22 /usr/bin/autorun --interval=1000
root 18665 0.0 0.0 30764 1896 ? S Apr30 3:19 kdeinit: kdesktop
root 18673 0.0 0.1 32812 2444 ? S Apr30 3:08 kdeinit: kicker
root 18674 0.0 0.0 26284 904 ? S Apr30 0:00 kdeinit: kio_file file /tmp/ksocket-root/klauncherMr11ca.slave-socket /tmp/ksocket-root/kdesktopzyaC6a.slave-socket
root 19846 0.0 60.2 3101880 1249636 ? Sl Apr30 14:07 rfnetstat
oracle 13936 0.0 0.2 1070616 4192 ? S May30 7:16 oracleDSJ49802 (LOCAL=NO)
root 1097 0.0 0.0 0 0 ? S 11:36 0:01 [pdflush]
root 24414 0.0 0.0 5716 436 ? S 12:21 0:00 crond
root 24415 0.0 0.0 2420 348 ? Ss 12:21 0:00 sh /tmp/sh/auto.sh
root 24416 0.0 0.0 2624 648 ? S 12:21 0:00 sh /tmp/sh/performance.sh
explat 2557 0.2 0.2 16024 5836 ? S 12:40 0:00 triggertoxml
oracle 2577 10.5 1.5 1070088 32032 ? R 12:40 0:01 oracleDSJ49802 (LOCAL=NO)
explat 2647 2.0 0.2 15784 5032 ? S 12:41 0:00 xmltodb_dk
qgtg 2669 0.0 0.0 44056 4 ? D 12:41 0:00 [oracle]
root 2670 0.0 0.0 3880 796 ? R 12:41 0:00 ps aux
有没有哥们遇到过类似情况?谢谢大家帮助。 |