Diamond Dave wrote:
> The software problem you're referring to was the infamous crash of the
> AT&T long distance network in 1990 when a software upgrade was applied
> to the over 100 AT&T/Western Electric #4ESS long-haul tandems in the
> US. It brought down most if not all the #4ESS switches to a screeching
> halt for the better part of a day.
No, that's not the one I'm thinking of.
I'm very sure it was a non Western Electric switch. It was made in
Plano Tx (forgot the maker's name) and it was used for local calls.
A odd sequence of errors would create a condition that was not checked
and the switch would go into a loop and freeze up. This happened at
the same time in a number of cities -- apparently the cause
circumstances were common at a certain time of day.
I found it in the archives. Here it is:
Newsgroups: comp.dcom.telecom
Date: 3 Jul 91 13:37:22 GMT
Local: Wed,Jul 3 1991 9:37 am
Subject: Service Outages Across the Nation
The (Newark, NJ) {Star-Ledger}, Wednesday, 3 Jul 91, p. 59
"Telephone sleuths are on the trail of mysterious service interruptions"
Washington Post Wire Service
WASHlNGTON - East Coast and West Coast, the pattern has been the
same: At about 11 a.m., an entire region's telephone system collapses.
For the past six days, solving the mystery of the failing phones has
become an obsession for the nation's service-conscious telephone
companies. Yet despite recurring similarities and clues in the
half-dozen failures to date, which have struck Washington, Los
Angeles, Pittsburgh and San Francisco. the detective work remains
mired in unanswered questions.
Yesterday, telephones in Pittsburgh were disabled for about two hours
for the second day running, underlining the phone systems'
vulnerability. The basic pattern was the same-an unexplained deluge of
electronic messages shutting down a computer built by DSC
Communications Corp. of Plano, Texas.
The telephone companies know that the failure is in complex electronic
systems that route calls. But they cannot say why the systems are
failing, why the failures are occurring within days of each other and
why they all begin at the same time of day. They cannot explain why
the failures occur in computers that are not linked electronically and
use different versions of software, the coded instructions that tell
computers how to operate.
...
Each of the afflicted machines has for some reason generated millions
of maintenance messages, which normally help a computer keep track of
its internal operations and communicate with others in the
network. These messages generally have priority over messages that are
routing calls. Too many maintenance messages meant there was no room
for routing calls, and the DSC machines ceased to function. The key
question, said John W. Seazholtz, Bell Atlantic vice president for
technology and information services, is "why is their (DSC's) system
going into overload every time we get a little rinky-dink issue that
should have been automatically dealt with? The software obviously has
a major problem."