I'm asking for anyone that have information on Comark systems. Our
3 dual Pentium II 400 systems are still hanging with the SCSI bus stuck
in some way. I'm running Fermilab Linux 5.0.2 with no mods, the hangs
occur with the standard 2.0.35 single processor kernel as well as with
the SMP 2.0.36 (dual) processor kernel that Dan Yocum installed.
I'm told by Heidi Schellman and Dan Yocum that this is due to
heat problems in Comark systems but I do not have any details or particulars
to back this up. Unfortunately the Comark field tech, Wayne, claims that the
only heat problems that he knew about were to do with Seagate Barracuda 18GB
drives that he claims are known to run hot. He says he installed fans in
a rack mounted 19 PC system at FCC which were the only systems he thought
needed it due to inadequate ventilation. He says that Fermilab requested
extra fans (i.e. paid extra?) in other desktop systems even though he thought
they were not needed and these were "hard drive fans" installed to blow on
internal Seagate Barracuda 18GB drives.
I'd like to know if anyone has information I can give him about heat
related problems, that might not have gotten back to them (or otherwise).
Their other senior field tech. (William Antony?) was out of the office. If
you have some information on this please could you send it to me. Also if
you have seen similar problems to ours (described below) please let me know.
Thanks,
Harry.
--------------------
The problem is that the PC will hang with either an external SCSI disk
with its light (SCSI activity) on. Or in one of the systems that has an
internal SCSI 8mm Eliant 820 tape drive, it has also hung with the SCSI
activity light lit on the tape drive. In this case I can get into a console
and log on as root to the IDE drive. However I cannot shut down the system
as it tries to do a SCSI bus reset and fails and aborts and keeps going
like that. In other hangs with the external drive the PC does not respond
at all.
All three systems ran okay for a long time with just Monte Carlo jobs that
do not access the disk much. Currently we are running data analysis jobs
that access the disk more, across the network (NFS) and locally. Sometimes
we spool data off tape using the tape drive across the network. It doesn't
have to have many jobs running nor is the disk access that heavy when the
systems hang. Its random in that I cannot predictably cause them to hang.
Two of the system both each hung once today (so far!), so it does happen
often.
I tried changing SCSI cables to much shorter ones in two of the systems.
The third system has only one external SCSI disk on a 2 foot cable. All
systems still hung as before. In case it helps the systems are given below:
all three systems have:
BXN440BX Night shade Motherboard with dual 400 MHz PII
supposedly including heasink and active fan though I haven't looked inside
80mm DC fan
st38641A internal Seagate IDE drive
qm309100td-s internal Quantum SCSI drive
xm6201b-s internal SCSI CDROM drive
bt-958 Buslogic wide SCSI controller
3c900-com-in 3COM 3c900 combo ethernet card
one of the systems has an internal 8mm SCSI drive plus 3 external SCSI disks
the second system has an external SCSI disk and a SCSI (narrow) scanner
and the third system has an external SCSI disk only.
============================================================================
Harry W.K. Cheung TEL: (630) 840-8628
Fermilab, PPD/EPP, MS 122, FAX: (630) 840-3867
Wilson Road, P.O. Box 500, EMAIL: cheung@fnal.gov
Batavia, IL 60510-0500. EXPT: Experiments FOCUS & E687
============================================================================
On Mon, 5 Apr 1999, Heidi Schellman wrote:
> NuTeV had problems with systems hanging and traced it to high temperatures.
> If you have not added lots of extra fans,Comark machines with many internal
> disks are not happy campers.
>
> Heidi
>