Dual CPU Comark systems hanging in e831...

yocum@fnal.gov
Fri, 02 Apr 1999 12:01:17 -0600

Any advice, folks?

___________________________________________________________________________
Dan Yocum | Phone: (630) 840-8525
Linux/Unix System Administrator | Fax: (630) 840-6345
Computing Division OSS/FSS | email: yocum@fnal.gov .~. L
Fermi National Accelerator Lab | WWW: www-oss.fnal.gov/~yocum/ /V\ I
P.O. Box 500 | // \\ N
Batavia, IL 60510 | "TANSTAAFL" /( )\ U
________________________________|_________________________________ ^`~'^__X_

------- Forwarded Message

Return-Path: cheung@fnal.gov
Received: from FNAL.FNAL.Gov (fnal.fnal.gov [131.225.9.8])
by sapphire.fnal.gov (8.8.7/8.8.7) with ESMTP id LAA07538
for <yocum@sapphire.fnal.gov>; Fri, 2 Apr 1999 11:40:13 -0600
Received: from fnpxsr.fnal.gov ("port 14548"@fnpxsr.fnal.gov)
by FNAL.FNAL.GOV (PMDF V5.1-12 #3998)
with SMTP id <01J9K1HJND1I0004KK@FNAL.FNAL.GOV> for yocum@sapphire.fnal.gov;
Fri, 2 Apr 1999 11:40:11 -0600 CDT
Received: from fnpx27.fnal.gov.fnal.gov by fnpxsr.fnal.gov via ESMTP
(951211.SGI.8.6.12.PATCH1502/940406.SGI.AUTO)
for <yocum@fnal.gov> id LAA13368; Fri, 02 Apr 1999 11:40:11 -0600
Received: by fnpx27.fnal.gov.fnal.gov id LAA21509; Fri,
02 Apr 1999 11:40:11 -0600
Date: Fri, 02 Apr 1999 11:40:10 -0600
From: "Harry W. K. Cheung" <cheung@fnal.gov>
Subject: system hangs on cheung,butler,stenson
To: yocum@fnal.gov
Message-id: <Pine.SGI.4.05.9904021131370.21459-100000@fnpx27.fnal.gov>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset=US-ASCII

hi Dan,
We have had a lot of system hangs on all three systems. This appears
to be due to one of the SCSI disks having some sort of continuous SCSI
activity and hangs the system. The only way to clear it seems to be to
switch off and on the PC which of course takes forever to do the fsck of
all the disks. Have you had any reports of this from other people? If I
don't use the system much everything is fine but when I run job(s) that
access a (SCSI) disk at a fairly continuous throughput the system will
sometimes hang. I thought it was the length of SCSI cables so I shortened
them on cheung and stenson but hangs still occurred. Then I thought maybe
it was because these two machines had other SCSI devices but butler hung
today too with the same problem so thats not it. I'm thinking of going back
to the non-SMP kernel to see if this has any effects. I certainly ran lots of
jobs without problems before on the SMP kernel but they were jobs that did not
access the disk very often. ANyway, I would appreciate any help you can
offer. I don't know how to look at the system logs to see whats happening.
Thanks,
Harry.

============================================================================
Harry W.K. Cheung TEL: (630) 840-8628
Fermilab, PPD/EPP, MS 122, FAX: (630) 840-3867
Wilson Road, P.O. Box 500, EMAIL: cheung@fnal.gov
Batavia, IL 60510-0500. EXPT: Experiments FOCUS & E687
============================================================================

------- End of Forwarded Message