TELECOM Digest OnLine - Sorted: From Our Archives: The Great AT&T Outage, January, 1990


From Our Archives: The Great AT&T Outage, January, 1990


TELECOM Digest Editor (ptownson@massis.lcs.mit.edu)
Sun, 7 Jan 2007 00:49:48 -0500 (EST)

One thing has caught my attention about news reports of Monday's AT&T
outage, whether on radio, on television, or in print: invariably
promotions for upcoming news about it and the first few sentences of
the item itself have talked about "problems for long-distance callers"
or "long-distance troubles." It's presented as a problem with
long-distance calling and then it segues to "AT&T spokespeople are
saying" or "according to AT&T" as if the two were one in the same.

Longer discussions of it get around to bringing up MCI and Sprint's
situations (being overloaded because AT&T customers were seeking
alternatives, for example), but most do not. Moreover, none
introduced the item as an AT&T-only problem, nor even as an AT&T
problem. It is called a long-distance problem with little or no
acknowledgment that "long distance" and "AT&T Long Distance" are not
synonymous these days.

David Tamkin PO Box 813 Rosemont IL 60018-0813 708-518-6769 312-693-0591
dattier@chinet.chi.il.us BIX: dattier GEnie: D.W.TAMKIN CIS: 73720,1570

[Moderator's Note: The {Chicago Sun-Times} had as their headline in
Tuesday's paper, "Calls Waiting!" and part of the human-interest side
of the story were interviews with business people -- particularly
telemarketing organizations -- who were pretty well out of action
Monday. The airline and hotel reservation people with their 800
numbers were also pretty hard hit by the events of the day. The
{Chicago Tribune} noted that AT&T spokespeople had *not* ruled out 'a
"computer virus" or act of sabatoge by a phreak unknown...' as the
source of their problem. PT]

------------------------------

From: Al Donaldson <vrdxhq!escom.com!al@uunet.uu.net>
Subject: Reach Out and Touch Someone?
Date: 16 Jan 90 04:05:48 GMT
Organization: ESCOM Corp., Oakton, VA

Word tonight that AT&T is having computer problems affecting
phone service nationwide. I can just see it now:

"Hello, Phoenix?"
"No, this is Fiji..."

Maybe they should spend more money on systems and less on advertising.

Al Donaldson
(ATT customer)

------------------------------

Date: Tue, 16 Jan 90 10:33:37 EST
From: Bill Berbenich <bill@shannon>
Subject: Re: Nationwide Long Distance Outage

Does anyone know how AT&T is handling their 800 WATS customers
who are inaccessible as a result of this outage? I recall a television
ad which said something like 'if you are an AT&T 800 WATS
customer and there is an outage, we GUARANTEE that your service
will be restored within an hour.'

--Bill Berbenich

------------------------------

Date: Tuesday, 16 Jan 1990 18:40:07 EST
From: John McHarry <m21198@mwvm.mitre.org>
Subject: Re: Who's Using Whom?

John Higdon wrote in V10 #29 that US Sprint was unaccessable during
the AT&T outage yesterday due to their leasing facilities from AT&T.

I don't know the specific access arrangements in his area, but I
believe the following to be generally true. Carriers do lease trunks
to one another; however, these are non-switched services. I don't
think AT&T has a tariff for switched access carriage for other IECs.
(Not too sure on that one) If that is the case, unless there was
indeed a cable cut, the common mode failure lies elsewhere. Of
course, this leasing of trunks doesn't obviate US Sprint's claim
regarding an ALL fiber optic network if they lease only fiber optic
trunks. There doesn't seem to be any claim that other networks don't
have some, or even lots, of fiber trunks.

What may be interesting here is the possibility of a shared BOC-AT&T
switch being in the common path, eg. the access tandem. Unless I
misread an old copy of Notes on the BOC Intra LATA Networks, or things
have changed in the meantime, there are some switches that are either
BOC owned and used by AT&T or (the interesting case) AT&T owned and
used by the BOC. These are an artifact of the pre-1984 state of
affairs, and represent cases where the split could not be neatly made
on one side or the other of the switch. If Mr. Higdon's LATA is such
a case, then US Sprint could be receiving service from the LEC, but
with an AT&T owned and operated switch in the middle. In this case it
is the LEC that is providing service by leasing switch capacity from
AT&T. US Sprint might well be using all their own trunks to the point
of presence. Beyond that, they have no choice or control.

Of course, to the end user, this is cold comfort. If there is only
one access tandem, you have no protection from a failure affecting it.
I suppose large users could use direct trunks to two or more IECs,
but, in most cases, that sounds like overkill, especially given the
probablility of the failure being guarded against vs the probablility
of backhoe fade knocking down both trunk groups.

These are only my own speculations, of course, and don't necessarily
reflect the views of anyone else. If I have erred, I am sure I'll be
corrected. On second thought, omit the if clause.
***************************************************************
* John McHarry (703)883-6100 McHarry@MITRE.ORG *
***************************************************************

------------------------------

Reply-To: John Higdon <john@bovine.ati.com>
Subject: Re: Who's Using Whom?
Date: 16 Jan 90 01:57:09 PST (Tue)
From: John Higdon <john@bovine.ati.com>

After writing:

> AT&T long distance has been severely disrupted today in the Bay Area
> due to a major cable cut, according to an AT&T operator I talked to.

The Telecom Moderator wrote:

> [Moderator's Note: I really think the operator you interviewed spoke
> without full knowledge of the circumstances of the outage;

That's, of course, an understatement. But it will be interesting to see
over the next few days and weeks how that AT&T PR department will
handle this one. It should also be fascinating to find out what the
*real* problem was, if it ever is to be known by the public.

John Higdon | P. O. Box 7648 | +1 408 723 1395
john@bovine.ati.com | San Jose, CA 95150 | M o o !

------------------------------

Date: Wed, 17 Jan 90 22:16:01 CST
From: TELECOM Moderator <telecom@eecs.nwu.edu>
To: telecom@eecs.nwu.edu
Subject: TELECOM Digest V10 #31
Message-ID: <9001172216.aa08352@delta.eecs.nwu.edu>

TELECOM Digest Wed, 17 Jan 90 22:15:10 CST Volume 10 : Issue 31

Today's Topics: Moderator: Patrick Townson

Questions and Answers on Network Service (AT&T Public Relations Department)
Bulletin to Employees, re: Outage (AT&T Public Relations Department)
AT&T Operator Policy During Outage (Ken Jongsma)
The AT&T Problem (Ole J. Jacobsen)

----------------------------------------------------------------------

From: AT&T Public Relations via TELECOM Moderator <telecom@eecs.nwu.edu>
Date: Wed, 17 Jan 90 14:12 EST
Subject: Questions and Answers on Network Service

[Moderator's Note: AT&T has provided the following questions and answers
regarding the outage. Another source of PR is 1-800-2ATT-NOW. PT]

QUESTIONS AND ANSWERS ON NETWORK SERVICE ... The following should
provide answers to any additional questions employees have, and
also may be useful for salespeople in responding to customers'
concerns:

Q. How does this outage compare with others AT&T has
experienced?
A. This was the first event in which all network switches were
affected. Previous outages have been local or regional in nature,
caused by cable cuts, problems with individual offices, or natural
disasters.

Q. Could this happen to MCI or Sprint?
A. AT&T believes all carriers are potentially vulnerable to
software problems in their networks, and have acknowledged such
problems at one time or another.

Q. How does this outage compare with MCI's recent 800 service
outage?
A. Since AT&T and its competitors do not ordinarily share such
information, there is no way of comparing the two events.

Q. How did this outage affect customers?
A. There was a significant impact on customers nationwide on
their regular long-distance service, as well as business services
such as 800 and Software Defined Network services, which use
AT&T's public switched network. Private-line services were not
affected.

Q. Was AT&T able to honor customers' requests to have their 800
or other services terminated on another carrier's lines?
A. A few requests like this came in. However, AT&T was unable to
switch these customers because restoring the entire network to
normal operation was being given highest priority.

Q. Is it accurate that AT&T operators refused to give callers
access codes for other carriers?
A. That was true during the early part of the day. However,
authorization to give out codes was given later in the day in the
spirit of doing whatever was necessary to help customers complete
their calls.

Q. Was AT&T able to meet its service guarantees?
A. AT&T will honor its service assurance commitments on 800
service, even though the warranty doesn't cover this kind of
network event.

Q. Will AT&T adjust bills to help compensate for any
inconvenience customers may have experienced?
A. AT&T plans to file an emergency tariff with the Federal
Communications Commission that will permit the company to have a
special day of discounted calling, which will provide some
compensation for customers. The exact offer and date have not yet
been determined.

Q. Does AT&T have a liability to compensate customers for losses
sustained during the network problem?
A. No, but the company will honor the 800 service assurance
guarantee, and will look for other ways to demonstrate to
customers that it recognizes service expectations held by
customers and the company were not met during the problem.

Q. Does that mean AT&T may compensate individual customers for
their losses?
A. Something like that has to be determined on a case-by-case
basis.

Q. What was done to restore service on Monday?
A. A software override was used to stabilize the network, and
that restored full service by 11:30 p.m. EST on Monday. The fix
is working fine and enabling the network to handle full business-
day volume.

Q. What is being done to prevent this from happening again?
A. AT&T's most urgent priority is to assure that all AT&T
customers receive the world's most reliable telecommunications
service. Every technical resource available, including Bell Labs
scientists and engineers, has been devoted to assuring it will not
occur again. The chances of a recurrence are small--a problem of
this magnitude never occurred before. AT&T's engineers have
collected an enormous amount of data and are extensively analyzing
it.

Q. Does the outage put the lie to AT&T's claims of having the
world's most reliable network?
A. Not at all. Despite the fact that AT&T experienced an
unprecedented, nationwide service problem, millions of calls on
the network still went through. All switches continued to
function, and AT&T's software experts were able to put in fixes
that brought the network back to normal operation before the day
was out. AT&T is confident it has the technological and human
resources to meet unexpected contingencies.

Q. Are there plans for a promotion or advertising campaign to
reinforce the company's reliability image?
A. While something like that may be contemplated in the future,
the priority now is to ensure full service for all customers, and
to make sure the problem doesn't occur again.

Q. How many calls were completed on the day of the outage?
A. On a typical business day, 110 million calls are handled on
the network, with 80 million to 85 million completed. ("Handled"
means calls that receive busy signals, that are blocked, that the
caller decides in mid-call not to complete, etc.). On Jan. 15,
148 million calls were handled and 83 million of them were
completed--a call completion percentage of 56 percent. Some 35
million of the 83 million calls were completed during the outage
period.

------------------------------

From: AT&T Public Relations via TELECOM Moderator <telecom@eecs.nwu.edu>
Subject: Bulletin to Employees, re: Outage
Date: Wed, 17 Jan 90 14:00:00 EST


[Moderator's Note: Following is the full text of an all-employee
bulletin distributed Tuesday. PT]


AT&T NETWORK RESTORED AFTER TEMPORARY OUTAGE ... AT&T's public
switched network is functioning normally again after a suspected
signalling system problem cut call completion rates across the
country to slightly more than 50 percent yesterday.

AT&T Chairman Bob Allen and Network Services Division Senior
Vice President Ken Garrett held a press conference today from the
Network Operations Center in Bedminster, N.J., to explain the
situation. "Even though it was a one-time 'hit' to the network,
it was certainly the most far-reaching service problem we've ever
experienced," said Allen.

"We didn't live up to our customers' standards of quality,"
he said. "It's as simple as that. That's not acceptable."
Preliminary indications are that a software problem developed
about 2:25 EST yesterday in a processor connected to a 4 ESS
switch in New York City, part of the new Signalling System 7
network that carries such call completion data as originating and
destination phone number separate from the call itself. The
problem spread rapidly through the network, affecting the regular
long-distance network, 800 service and the Software Defined
Network (SDN). Private lines and special government networks were
not affected.

After eliminating a number of suspected causes, software
overrides applied about 10 p.m. last night finally restored normal
network capabilities over the next couple hours. Allen said
people at AT&T Bell Laboratories and in the Network Engineering
network capabilities over the next couple hours. Allen said
people at AT&T Bell Laboratories and in the Network Engineering
organization are studying the volumes of data accumulated. "We
are confident the root cause will be identified, at which time we
will take appropriate steps to make certain it doesn't happen
again." While he did not want to speculate until all the analysis
is in, Allen said there is a "growing level of confidence" that no
computer "virus" was involved.

Allen said AT&T is talking to major customers affected by the
outage to explain what happened, and to detail the company's
response. And, he said, AT&T will file an emergency petition with
the Federal Communications Commission calling for a special day of
discount calling to help compensate both residence and business
customers who were inconvenienced. Garrett added that call
attempts were near normal levels yesterday, despite the holiday,
and that "there are no indications of any problems today."

------------------------------

From: ken@cup.portal.com
Subject: AT&T Operator Policy During Outage
Date: Wed, 17-Jan-90 09:57:57 PST

While it will be interesting to find out the actual reason for the
failure (NBC implied it was related to a new software release that
failed under load in New York and propagated through the rest of the
country), it is even more interesting to hear about AT&T "policy".

That is, AT&T operators would not give instructions for using
alternative carriers or even hint that it might be possible to get
through other carriers.

Now, when I go to a Hilton and they are full, they will check with
Ramada and any other area hotels to see if their are rooms available.
I understand the reasons they won't! Once a customer learns how to use
an alternative carrier, they may not go back...

Ken Jongsma
ken@cup.portal.com

------------------------------

Date: Wed 17 Jan 90 13:44:37-PST
From: "Ole J. Jacobsen" <OLE@csli.stanford.edu>
Subject: The AT&T Problem

What amazed me the most about the AT&T outage the other day was
people's inability to live with the situation and use 10xxx dialling.
One business guy interviewed on CNN said he "lost several hours and
lots of money" because of the long distance problems. I have never
figured out why the RBOCs have been so unwilling to teach the public
about 10xxx dialling, there must be at least a dozen carriers
available to the average customer (allright maybe 6, but stilll....)

Ole

[TELECOM Digest Editor's Note: And there you have a selection of the
messages which ran in the Digest seventeen years ago on the occassion
of the great AT&T service outage, in mid-January, 1990. Like the great
fiasco in November, 1948 where the Chicago Tribune reported in its
first edition headline that "Dewey Beats Truman", some major fiascos
live on and on in infamy, as I suspect will happen with the AT&T
outage in January, 1990. PAT]

Post Followup Article Use your browser's quoting feature to quote article into reply
Go to Next message: Danny Burstein: "Phone Charges to be Reduced For Families of Inmates in New York"
Go to Previous message: System Administrator: "AT&T Reliability"
TELECOM Digest: Home Page