NSA Spying

Unless you've been living in an igloo the past several weeks, by now you've heard that the National Security Agency (NSA) has apparently been conducting surveillance of American citizens without a warrant, in apparent violation of FISA, the post-Nixon law which regulates such activities.

Ignoring for the moment the question of whether this activity is legal (I doubt it is) or advisable (they could have asked Congress for permission), I'd like to think about what exactly the NSA might have been doing. I don't have a security clearance and have never (to my knowledge) come into possession of any classified information, so this is purely a thought-experiment.

(And if I happen to be right about a few things, please re-read the part about this being a thought experiment before you come knocking on my door, Mr. FBI Agent.)

Interception Capabilities

It is fair to assume that the NSA has the technical capability to eavesdrop on any international communications anywhere in the world. Any phone call, e-mail, web connection, fax transmission, etc. that crosses an international border can probably be intercepted if there's enough interest (and budget) to do so. All long-distance electronic communication travels in one of three media: copper, fiber optic, and wireless. All three media are interceptable, and the "Echelon" program was probably created for the express purpose of intercepting nearly all international communications.

Old-fashioned copper cables are easy to tap. All you need is an induction coil. Wireless is similarly easy: an antenna in the right place will do the job (and the NSA has been known to rent offices which happen to be in useful locations for intercepting signals from, say, the Russian embassy).

For a long time, many people assumed that fiber optic cables were secure, but that's not true of course. Optical fibers will "leak" light if they are bent at a radius which is just a little too tight, making it possible to tap a fiber without disrupting the signal or cutting the cable. It has been reported for many years that the U.S. operates one or more submarines built just for the purpose of tapping undersea optical cables.

The biggest challenge is probably free-space optics: using a laser to send signals through the air. But free-space optical systems have a limited range because of atmospheric problems (dust, birds, etc.), and so aren't used very often. But even free-space optics can be intercepted (with difficulty) because the laser will scatter off dust in the atmosphere, and a well-placed receiver may be able to pick out the signal.

The NSA probably also has the capability to intercept some in-country communications in foreign countries of interest. Places like Russia, Iran, and the U.K. It's much harder to be comprehensive with in-country communications, since the fewer the number of hops, the fewer places to tap the signal. International communications also tend to be funneled along a limited number of international trunks, while the in-country networks are more meshed. Finally, international transmissions are much more likely to be interesting to spies, so there isn't as much as incentive to intercept purely domestic communications. So the NSA's capability for in-country interception is probably focused on useful and/or easy targets like government agencies and mobile phone networks.

Finally, the NSA probably has one big hole in its technical capabilities: there is probably little or no ability to intercept purely domestic communications in the U.S. This is because wiretapping inside the U.S. is outside the NSA's traditional mission, so there would be no need to develop the capability (the FBI, on the other hand, can and does wiretap calls inside the U.S., but the FBI has a very different mission and structure than the NSA).

So, to summarize: the NSA probably intercepts a very high percentage of communications which crosses international borders, and a lesser percentage of in-country communications in foreign countries, likely focused on mobile phone networks, government agencies, and other groups or individuals we have a specific interest in. There's probably very little interception of U.S. domestic communications, except to the extent that it happens to cross international borders (this blog, for example, is actually hosted in Canada; and for a time many domestic long distance calls in the U.S. were being routed through Canada).

What to Do With the Data?

It is absurd to think that the NSA has the ability to actively monitor all the communications it probably intercepts. There's simply too much data, and not enough people to analyze it all.

But you can learn a lot without knowing anything at all about what a message says, just who the sender and recipient are (aka Traffic Analysis). Every time one person calls another person, or sends an e-mail or fax, or visits a web site, you can infer a relationship between the two people or organizations involved. With enough data and time, you can construct the social network of everyone whose communications you're tracking.

When you consider that the NSA is likely doing traffic analysis on a significant fraction of the electronic communications outside the U.S., that means that they have likely built up a database of the social networks of a big chunk of the developed world's population. This is an extremely useful thing to have if you're in the espionage business, since it lets you immediately connect a suspected spy or terrorist to all his or her friends and associates. If you know two or three members of a terrorist cell, you can probably identify the rest of the cell simply by looking for common associates.

Going beyond simple traffic analysis, there have been persistent rumors in the speech recognition community that the NSA has an unusual interest in the technology. The civilian state of the art (which probably isn't so different from the classified technology) allows you to do fairly powerful word spotting in multiple languages, but accurate machine transcription of a phone call is still difficult. Speech recognition is computationally intensive, so it probably isn't used on all phone calls, but on a subset of calls which excludes obviously uninteresting ones. It's not clear how useful this actually is, since (presumably) most criminals, spies, and terrorists know enough to speak in code over the phone. Simple word substitution like substituting "double cheeseburger" for "bomb" will foil automated speech recognition since computers aren't smart enough to know that plotting to hide a double cheeseburger on an airplane makes no sense.

It also makes sense for the NSA to keep archives of communications which might be of future interest. The main limitation is storage space, and a few billion dollars a year will buy you one heck of a disk array. The advantage is that if you identify new enemies (for example, when they fly airplanes into office towers) an archive will let you go back and trace how the plot took shape, who else might have been involved, and what other plots had been considered and might still be active. So there is probably a giant disk array somewhere which holds a significant fraction of the world's international communications.

Limits

All this technology (amazing as it is) can't really do anything more than identify potentially interesting relationships, where "interesting" is defined by the people running the system. If you're only looking for islamic bomb plots, you're not likely to catch the Japanese naval attack on Pearl Harbor. And vice-versa.

But the real bane of any large scale data mining operation is the number of false positives you generate: things that look relevant to the computer, but are in fact completely useless. Software designed to search for a needle in a haystack will inevitably supply far more needle-like bits of hay than actual needles. An article in today's New York Times suggests that the value of the data the NSA was able to supply to the FBI was fairly limited, being names, e-mail addresses, or phone numbers of people with connections to suspected terrorists. Since the NSA identified thousands of "leads" and often couldn't provide any context (because such information would be highly classified), there wasn't much the FBI could do.

Previous
Previous

Health Coverage

Next
Next

Why Does Technology Suck?