Account Links: Cart | Your Account | Logout

Skip to content

Red Hat Knowledgebase

Red Hat Knowledgebase Search:

Updated Within the Last:

New Solutions within the last day New Solutions within the last week New Solutions within the last month

Browse by topics:


Click to View a Topic
Red Hat Enterprise Linux > AS/ES/WS v. 4 > Issue <<  107 of 616 >>

Solution Tools:


Email a Solution Postcard Printer version Submit a comment on this answer Update notifications Request an answer Back

Article Reference

Article ID: 7662
Last update: 02-05-06
Issue:
My Red Hat Enterprise Linux 3 (or 4) system had a kernel panic, an oops message, or is freezing for no apparent reason. How can I find out what is causing this?
Resolution:
Resolving a kernel panic or a kernel oops is not a simple task. First off, in order for Red Hat to understand the cause of this, the panic or oops message in its entirety should be available. Below is a "Profiling" document that contains the information that Red Hat requires in order to best troubleshoot a kernel panic or kernel oops related to a system crash.

It is recommended that the system runs the latest kernel available for its release version and that the system is completely updated.

To further debug this problem the following information is needed:
  • The out put of the command sysreport

Note: sysreport is an application that may not be installed on the system. If it is not installed, please install the sysreport RPM in one of the following ways:

  • Run: up2date sysreport if the system is registered with RHN, this will download and install the package.
  • Locate the sysreport package on the installation CDs and install the package with: rpm -ivh sysreport-version#.rpm - where version# will match the file's version number on the installation CD.

If possible, run these commands when the slow down is occuring, or as close as possible to a reproduceable crash. That being said, Red Hat recognizes that this is not always possible, but the information is still needed.


  • OOPS messages:

    If the machine crashes with an OOPS message, similar to the following:
    Unable to handle kernel NULL pointer dereference at virtual address
    00000018
    *pde = 0f992001
    Oops: 0000
    CPU:    1
    EIP:    0010:[]    Not tainted
    Using defaults from ksymoops -t elf32-i386 -a i386
    EFLAGS: 00010207
    eax: 00000000   ebx: c87a1ed0   ecx: c02de5e0   edx: f3de3b00
    esi: c87a1eb4   edi: 00000000   ebp: 00000007   esp: c3f5bfa0
    ds: 0018   es: 0018   ss: 0018
    Process kswapd (pid: 11, stackpage=c3f5b000)
    Stack: 00000000 fffffe5d 00000245 00085992 00000001 00000000 000000c0 000000c0
           0008e000 c0136c51 000000c0 00000000 c3f5a000 00000006 c0136ce5 000000c0
           00000000 00010f00 c3ff1fb8 c0105000 c0105866 00000000 c0136c90 c02f5fc0
    Call Trace: [] do_try_to_free_pages [kernel] 0x11
    [] kswapd [kernel] 0x55
    [] stext [kernel] 0x0
    [] kernel_thread [kernel] 0x26
    [] kswapd [kernel] 0x0
    Code: f7 40 18 06 00 00 00 75 f0 8b 40 28 39 d0 75 f0 31 d2 85 d2
    
    
    >>EIP; c0136177    <=====
    
    Trace; c0136c51
    Trace; c0136ce5
    Trace; c0105000 <_stext+0/0>
    Trace; c0105866
    Trace; c0136c90
    Code;  c0136177
    00000000 <_EIP>:
    Code;  c0136177    <=====
       0:   f7 40 18 06 00 00 00      testl  $0x6,0x18(%eax)   <=====
    Code;  c013617e
       7:   75 f0                     jne    fffffff9 <_EIP+0xfffffff9>
    c0136170
    Code;  c0136180
       9:   8b 40 28                  mov    0x28(%eax),%eax
    Code;  c0136183
       c:   39 d0                     cmp    %edx,%eax
    Code;  c0136185
       e:   75 f0                     jne    0 <_EIP>
    Code;  c0136187
      10:   31 d2                     xor    %edx,%edx
    Code;  c0136189
      12:   85 d2                     test   %edx,%edx
    


    The full output from the OOPS message will be required. It can be obtained in one of the following ways:

    • Copied down by hand (or from a digital picture), please remember that the complete message is needed and that this may sometimes be the only way to get the oops message.
    • Setting up a serial console to capture the message. This can be accomplished by connecting a null modem cable to the serial port of the machine and adding:

      console=ttyS0,115200 console=tty0

      to either the kernel line of grub or in an "append=" statement for lilo. Once this is done, on the other machine the null modem is attached to, run a terminal emulator such as "minicom" (linux) or "hyperterminal" (other operating system).

  • Mysterious Hangs, Freezes and Slowdowns:

    For hangs and freezes, capture some information by enabling the sysrq key. This can be enabled by editing the file /etc/sysctl.conf and changing the line to read:

    kernel.sysrq = 1

    Enable it immediately by saving the file and running:

    # sysctl -p

    Once this is enabled, the output from the following key combinations is necessary:
    • alt-sysrq-t
    • alt-sysrq-p
    • alt-sysrq-m
    * Please note that sysrq is the PrintScreen key.

    Please run alt-sysrq-p multiple times to capture output from all CPUs on the machine. Also, run alt-sysrq-m last as it has a possiblity of locking the box up harder then it already is. Alternatively, a serial console can be used to capture the information. Ensure that there is at least 1 alt-sysrq-p from each CPU, denoted by a CPU: # line in the output. Note the first CPU is number 0.

  • Slowdowns:

    For general slowdowns the following is needed:
    • What kind of load is the box under?
    • What processes or programs are anything to produce this load?
    • If these processes or programs stop running, does the slowness immediately go away?

      Next, follow the following steps to gather some data:

      1. Enable kernel profiling by turning on nmi_watchdog and allocating the kernel profile buffer. For example, add the following two items to the "kernel" line of /boot/grub/grub.conf (using grub):

            profile=2 nmi_watchdog=1

      as in the following example:

            kernel /vmlinuz-2.4.9-e.27smp ro profile=2 nmi_watchdog=1

      If using LILO, add the following to the global section (before the first image= line) of lilo.conf:
            append="profile=2 nmi_watchdog=1"
      and run lilo -v as root.
      Now reboot.

      2. Create a shell script containing the following lines:
      #!/bin/sh
      while /bin/true; do
        echo;date
        /usr/sbin/readprofile -v -m /boot/System.map | sort -nr +2 | head -15
        /usr/sbin/readprofile -r
        sleep 5
      done
      
    • Make the system demonstrate the aberrant behavior.
    • Run the following three commands simultaneously:

            Execute the readprofile shell script above, redirecting its output to a file.
      Execute vmstat 5 and redirect its output to a second file.
      Execute top -d5 and redirect its output to a third file.

    • Attach the output files (preferably in gzipped tar file format) to a web ticket.


    A web ticket can be opened with Red Hat support by logging into the Red Hat Support section and selecting the Web Support button located under the "Active Support Entitlements" section.

Note: For alternative solutions such as setting up a netdump server to capture this information, please refer to the recommended FAQs.


How well did this entry answer your question?


good wrong incomplete out of date
Red Hat Enterprise Linux > AS/ES/WS v. 4 > Issue <<   107  of  616  >>