I’ve Been ERAMmed

Posted by The ATC Freq on March 12, 2010 at 12:19 pm. One comment

In the event people are visiting to see more updates on En Route Automation Modernization (ERAM), I’ll try and make this short.

ERAM, the replacement computer/display system for enroute center controllers, is still experiencing problems and the FAA continues to use it on live air traffic.  In other words, not much has changed.

The most critical ERAM bugs that I’m aware of (outside of outright display failures/lockups – big red “X”s) involve data block/tracking issues, some of which haven’t been corrected even though they’ve been known of for some time.  The absolute worst tracking bug is when a data block drops off a target and the accompanying flight plan is simultaneously deleted.  Other critical tracking bugs involve data blocks tracking on the wrong target.

One thing that has changed since I last wrote about ERAM is that I’ve since had the “opportunity” to work with it on live traffic.  Needless to say the experience didn’t change my opinion about the program.

We were told in advance to not “experiment” with ERAM while using it (presumably because of the myriad problems we might encounter).  Apparently it’s true that we’re really not testing the software any more right now; we’re just using it, while trying to not use it so much that we break it.  (Maybe at this juncture the FAA figures there are enough known bugs already and they don’t want to find more.)

Many of those involved with the ERAM project continue to believe and/or give outward appearances that things are going fine in spite of the persistent problems.  In fact, controllers are hearing less and less of the bugs in the software as time goes by, which has the added effect of making it appear that things are going better than they really are.

I guess it’s the old, “No news is good news” theory…

During the Initial Operational Capability (IOC) or first run on live traffic at Minneapolis Center (ZMP) they provided controllers with a list of known major bugs in ERAM, but they’re now no longer providing that information before live runs.  (Perhaps the list is too long…)  Either way it’s obvious that the FAA doesn’t think controllers need to know about the deficiencies of the system they’re supposed to use to keep airplanes separated (or doesn’t want to advertise them anyway).

ZMP had two operational runs with ERAM the first week of March and not surprisingly we experienced some repeats of some of the most significant ERAM bugs.

Mind you, most of the bugs we experienced weren’t new bugs; they were existing bugs they already knew about.

Instead of fixing the known bugs before running ERAM on more live traffic, the FAA continues to run software they know is faulty more often.

“Idealists” like me would like to believe that the FAA and Lockheed Martin would want to fix those known bugs once they were discovered before running the same versions on more live traffic.

But that’s not how the ERAM program is progressing at all.  That’s because the FAA and Lockheed Martin have agendas that don’t prioritize the safety of the flying public.

First, fixing all the bugs before running it more would take more time and cause the program to fall further behind schedule.

Another reason the FAA is now running software builds with known bugs on live traffic at the various key sites is to avoid refresher training for controllers as per the Memorandum of Understanding (MOU) between the FAA and the air traffic controllers’ union, NATCA.

The MOU says that:

“Basic and supplemental ERAM training shall be provided to all BUEs prior to implementation of ERAM at each facility.  If more than forty-five (45) days elapses between the time BUEs complete ERAM training and the actual implementation of ERAM at the facility, ERAM refresher training shall be provided prior to implementation.”

Apparently IOC equates to “implementation” in the MOU (although I don’t see that definition in the MOU).

So there you have it:  the FAA is now rushing ERAM into use at least in part so that it doesn’t have to re-train its controller workforce per the MOU.

Now that the Winter Olympics are over, Seattle Center (ZSE) is now also back in the ERAM game as a key site again.  ZSE just ran an extended live run less than a week ago that predictably also had its share of problems, simply because they were running the same software build everyone knew had existing bugs.

In spite of all the problems, the FAA continues to press towards an In Service Decision (ISD) after which the non-key sites (the rest of the enroute centers) can move towards their own IOCs, in spite of the fact that the longest period any of the key sites have run ERAM is for 8 days (and that under very controlled conditions).

So we’ve established that the FAA isn’t fixing significant/critical known bugs before running the ERAM software more often.  They’re telling controllers to tip-toe around ERAM while using it to create the illusion that things are going well, in spite of the many known bugs.  They’re withholding more and more information as time goes by and the project continues to go badly.

It’s obvious that the FAA is sticking with their approach that prioritizes the deployment schedule (which for the record continues to slip further and further behind), saving money on training, and pretty much anything else, over safety.

It’s just business as usual for the FAA…

Complacency: Laziness, or Learned?

Posted by The ATC Freq on March 5, 2010 at 9:57 pm. One comment

After a recent incident gained media attention, there were accusations that the FAA and its air traffic controllers had grown complacent in regards to safety.

The latest incident involved a veteran controller at New York’s JFK airport, who had his children relay some air traffic clearances on the radio frequency.

The JFK incident was the third in a string of recent air traffic control related incidents that made the headlines, including last summer’s mid-air collision near Teterboro airport in New Jersey of two VFR aircraft, as well as the incident last fall where Northwest 188 lost contact with air traffic control and eventually overflew its destination.

Always ready to put on the proper face to the media, the top levels of FAA management reacted to the latest incident with shock and outrage:

“This lapse in judgment not only violated FAA’s own policies but common sense standards for professional conduct. These kinds of distractions are totally unacceptable,” administrator Randy Babbitt said in the statement.

…(violations) of FAA’s own policies…“?  “…standards for professional conduct…“?  Really, Mr. Babbitt?!

Let’s examine some facts about the three incidents:

In the case of the mid air collision the supervisor was on the clock but out of the facility running personal errands, which were apparently more important than his job.  In the case of Northwest 188, several managers decided to simply ignore orders.  And in the latest case, one or more supervisors apparently allowed an employee on two different days to let his children talk to airplanes on the radios.

In each and every incident, there was an FAA manager involved that wasn’t following the rules.  Do you think that’s just a coincidence?

The managers are the people supposed to be ensuring that the workers are following the rules, and what are they doing?  They’re breaking the rules themselves!

Does anyone remember this video?

The conduct of some of those managers is a violation of FAA Human Resources Policy which states in part (my emphasis):

An employee’s conduct on the job has a direct bearing on the proper and effective accomplishment of official duties and responsibilities. Employees are expected to approach their duties in a professional and business like manner and maintain such an attitude throughout the workday. It is also expected that employees will maintain a professional decorum at all times while in a temporary duty travel status or otherwise away from their regularly assigned post of duty, such as telecommuting, whether at home or at a telecommuting site, or attending training.

So much for following the rules and the higher standard FAA managers are allegedly held to.

Do you think a controller or two might have noticed managers in those cases intentionally violating FAA policies and acting unprofessionally?  Do you think they didn’t notice nothing happened to any of them for doing so?

Isn’t this called, “setting an example”?

Last year I wrote about the FAA’s “dumb luck” approach to safety, including the “customer service initiative” that ultimately led to two FAA safety inspectors turning into whistle-blowers when FAA managers ignored their concerns about problems with Southwest Airlines’ maintenance.

It was clear then that the FAA was only concerned about safety when the problems hit the headlines.

Then the FAA decided to reclassify air traffic control errors, turning many errors into non-events (and making it appear to the flying public we were having fewer errors).

They created a safety program (ATSAP) that allows controllers to anonymously report errors without fear of punishment, but which in turn also masks and allows the FAA to ignore many systemic problems.

Currently the FAA is testing its ERAM software, even with its many known bugs, on live air traffic.

And almost every time something makes it into the news that involves the FAA, an FAA spokesman quickly says, “Safety was never compromised.”

The FAA claims it’s an organization that’s passionate about safety, but there’s little to indicate it’s actually doing much to improve safety at all.  If anything it’s degrading safety more often than not.  It says one thing but does another.

So between managers not following FAA rules, and the many changes to FAA policies and procedures regarding air traffic safety and error reporting, should it really be a surprise that controllers may have gotten complacent?

And if they are complacent, aren’t they really just following direction and examples from the FAA management team?

More Signficant ERAM Problems

Posted by The ATC Freq on February 24, 2010 at 3:39 pm. 15 comments

Salt Lake Center (ZLC) reverted back to the HOST computer system last night due to major problems after starting an ERAM run last week that was supposed to be permanent.

I’m sure the FAA and the contractor Lockheed Martin will write it off as just another “glitch” (i.e. part of the development cycle), but it’s another glaring demonstration of how unreliable the ERAM software still is, even though the FAA continues to test it on live traffic, expecting air traffic controllers to simply work around its many problems and keep aircraft safely separated nonetheless.

ZLC started running ERAM on what was supposed to be a permanent basis on the morning of Wednesday, February 17.

They had previously completed an an eight day test that ended the first week of February, followed by a two week delay in which Lockheed Martin was supposed to correct the (known) bugs in the software before ZLC began using the new version permanently.

The latest failure shows that in spite of the software updates that obviously ERAM still has a long way to go before it’s fit to use on live traffic 24/7.

Notably the event marking the first enroute center to transition to ERAM full time came and went quietly.  Instead of calling in the media and having a press release (and having sheet cake), the FAA barely noted the occasion.

The complete lack of fanfare noting the first enroute center to start running ERAM full time shows that the FAA knows full well how unreliable/unstable the ERAM software still is.  At this point it’s clear they’re making deliberate efforts to not call any attention to the ERAM project.

After lots of boastful press from the FAA over ERAM early last year, including statements of how the program was on budget and ahead of schedule (even though it wasn’t), the FAA abruptly stopped talking about ERAM after significant problems running it at ZLC in a test last fall.

The FAA apparently learned its lesson then and now isn’t going to mention ERAM at all, instead choosing to continue testing and deploying ERAM quietly and keeping its fingers crossed that it won’t cause a news event.

Every time the FAA and Lockheed Martin complete another test without significant problems they seem to convince themselves the project is doing just fine.  After the eight day ZLC test they were convinced the software was ready for permanent use after just a little “tweaking”, even though it’s now clear that was far from the truth.

Last fall one of the problems that resulted in the aborted ZLC test was datablocks (the tag that displays the aircraft call sign and altitude as well as other information) wouldn’t track properly and sometimes ended up tagging up on the wrong target.

Guess what?  That problem still exists many months (and many updates) later.

The data block/tracking functionality is fundamental to an air traffic display system and is thereby safety-critical.  It’s disturbing that at this stage this basic functionality is still so unreliable in ERAM.

This may not be simply due to software bugs either; there may be some significant problems with the software tracking algorithms within ERAM, which from what I’ve heard are radically different from those used in the HOST computer system.

Here’s a list of some of the latest bigger problems with ERAM (and note that some of them, especially the tracking problems, aren’t new):

Interim Flight Plans – If a controller starts an interim flight plan (datablock only, no beacon code or routing) ERAM aggressively searches for the first target of opportunity to track. It may be a primary, or a beacon belonging to another aircraft.
Track Un-Pairing – Arbitrarily the datablock will disassociate from the beacon target. We are unable to determine what seems to cause it. We looked at RADAR sort boxes and ASR terminal RADAR feeds, and who knows what else. ERAM will not automatically re-pair the datablock and the target like HOST does. We see this happen frequently around SLC where limited datablocks create a bright large yellow spot over the airport. You can’t shut them off and it is easy for the un-paired datablock to disappear into the blob.
Track Swap – We had some instances of departures where ERAM switched datablocks on aircraft on completely different routes and entering different sectors.
Bogus Beacon Codes – Frequently ERAM will flash in the third line a bogus beacon code (like the aircraft is squawking an incorrect code) for one sweep and then it disappears.
Track Pairing – If ERAM associates a full datablock with an incorrect beacon, you have to track the datablock at least 32 miles away from the incorrect beacon for ERAM to accept the disassociation. Approximately 30 seconds has to pass before you can pair it with the correct beacon target.
Bogus Alerts – We see significant numbers of bogus alerts; MSAW, conflict probe in EDST (URET replacement), aircraft working is SUAs.
Inter Facility Handoffs to Vertically Stratified Sectors – If an aircraft changes altitude 30 minutes prior to exiting the facility, and the new altitude causes the aircraft to enter a different sector in the receiving facility, ERAM will hand the aircraft to the incorrect sector if you use the auto addressed handoff option (single alpha character followed by CID). You have to manually address the handoff to the correct sector.

Apparently the latest software version yet to be put into use isn’t intended to fix many of the aforementioned problems either; instead it addresses other bugs.

It will be interesting to see how the latest episode affects the entire ERAM project.

One way or the other it’s going to result in the project falling further behind schedule.

But I doubt very much that it will convince the FAA to stop testing the software on live traffic.