Accelerating cryptography with FPGA clusters

Accelerating cryptography with FPGA clusters

The most computationally intensive parts of code cracking and key recovery algorithms are relatively simple operations that must be performed iteratively, and that can be parallelized for high performance. When deployed on FPGAs, these algorithms can use available FPGA resources with extreme efficiency because of the simple nature of the tasks being performed.

4There is a quiet, international battle underway, a battle that impacts every data consumer and producer. On one side of this battle are the cryptographers who work to protect our national security and the privacy of our personal information through increasingly strong methods of encryption, and through penetration testing. On the other side are the hackers, criminals, and unfriendly governments who use increasingly sophisticated code cracking methods to access seemingly secure data. The weapons in this war include increasingly sophisticated, massively parallel computers that are built using FPGA devices.

Cryptography – which includes code creation, analysis, and breaking – has been one of the most important drivers of computing technology for more than six decades. In fact, the development of modern computers owes much to cryptography research performed during and immediately after World War II, by computing pioneers such as Alan Turing and Claude Shannon.

In the modern era of ubiquitous wired and wireless communications, cryptography research is once again leading to dramatic advances in computing. As more of our personal, commercial, military, and national security data are transmitted and managed online, opportunities for mischief have become increasingly available, and stronger methods of encryption are therefore critical. Research into improved encryption requires analyzing and auditing systems in widespread use today, and systems of the future.

The following discussion focuses on how cryptographic algorithms, and code cracking in particular, can be accelerated using clusters of widely available Field Programmable Gate Arrays (). We first explain the challenges behind cryptographic computing, then present a case study on how -accelerated methods can speed the decryption of widely used data security methods including DES, WPA, WEP, and GSM and we compare the FPGA-based approach to more traditional, software-oriented methods. And finally, we demonstrate that a cluster of commodity FPGAs, contained in a single 4U chassis and consuming less than 1,400 W, provides the computational equivalent of more than 1,000 dual-core processors.

White hats versus black hats

Code cracking is a controversial activity, with significant legal perils for those who pursue it to access otherwise off-limits information. Likewise, there are significant risks of financial loss or sensitive data breaches for those who are the targets of such intrusions. All digital data users have cause to be worried if the systems they rely on are vulnerable to unauthorized access. For this reason, many organizations and governments make use of cryptography experts for security auditing. These professional hackers engage in legitimate attempts to break into systems, with permission of their owners, testing commercial and public data network security.

When auditing a presumably secure system, a “white hat” hacker will deploy much the same software algorithms and hardware as their “black hat” counterparts. This activity, which is referred to as , is aimed at finding and reporting vulnerabilities in the data security strategy. Are inadequate or obsolete encryption methods being used? Are weak passwords commonly chosen by system users? Are there operating system or application-layer vulnerabilities?

Another legitimate reason for such cracking is to recover lost and forgotten passwords. Data locked up and unavailable because of password mistakes results in enormous amounts of lost productivity each year. Confidential documents and entire databases can be rendered useless, even to their rightful owners. Password recovery software tools based on black hat cracking methods are therefore used daily in government, industry, and academia to regain access to critical data.

The challenge of cryptographic computing

The algorithms that form the basis for these security cracking and password recovery tools require large amounts of computing power and can require very long runtimes to complete. In fact, while widely available software tools running on a single computer can crack a dictionary-based password in minutes, recovering a relatively strong password might require years of iterative computations.

In recent years, computer clusters have been augmented by the addition of GPU and FPGA accelerators. These devices, by virtue of their parallel structures, can provide substantial acceleration for many computer-intensive algorithms including password recovery.

Because of their roots in graphics processing for standard, off-the-shelf computer systems, GPUs are well known to most programmers. However, although they can provide a significant performance boost for code cracking algorithms[1], GPUs are not well optimized for this application. In particular, bitwise encryption algorithms such as the Digital Encryption Standard (DES) and SAFER (as used in Bluetooth) do not lend themselves to efficient implementations on either CPUs or GPUs. GPUs also need large amounts of power relative to the computations being performed. GPUs provide a high level of acceleration, but at a significant cost in power consumption. Instead, FPGAs offer a viable alternative.

Case study: FPGAs for high-performance password cracking

To demonstrate how FPGAs can address the need for faster password recovery, we selected a representative set of password-cracking problems and deployed highly parallel recovery algorithms on FPGA clusters similar to the one shown in Figure 1.

21
Figure 1: FPGA cluster populated with Xilinx Spartan FPGAs
(Click graphic to zoom by 1.9x)

The systems used in this demonstration included a Pico SC3 cluster containing 77 Xilinx Virtex-5 FPGA devices, an SC4 cluster containing as many as 176 Spartan-3 FPGAs, and an SC5 cluster containing 72 larger-capacity Spartan-6 FPGAs. In all cases, these clusters were housed in a single 4U chassis drawing under 1,400 W. Using FPGA methods and tools, we successfully deployed recovery algorithms for the following encryption methods:

FileVault

FileVault is based on the Advanced Encryption Standard (AES) and provides an encrypted file system for the Apple Macintosh operating system. Recovering FileVault passwords requires hashing a possible password with the SHA-1 hash function thousands of times. This derivation method, known as a Password-Based Key Derivation Function, or PBKDF2, is used iteratively in an attempt to decrypt the FileVault image. Using the 72-FPGA SC5 cluster, we were able to speed the FileVault key recovery application by a factor of 498 times when compared to the original software implementation running on an Intel Core i7 processor at 2.93 GHz. The result was a reduction in runtime from 21 hours to just 2.5 minutes (see Figure 2).

22
Figure 2: FileVault cracking is accelerated 498x over the CPU, using an FPGA cluster
(Click graphic to zoom by 1.9x)

Protected Access (WPA)

WPA is used to secure wireless networks and is implemented via a temporal key integrity protocol. WPA and WPA2 are intended to replace the previous-generation Wired Equivalent Privacy (WEP) protocol. WPA recovery is characterized by performing a PBKDF2 SHA-1 algorithm to the candidate passwords for the network and then comparing authentication hashes to determine if the password is correct. When run on our FPGA-based cluster, this algorithm showed acceleration of 498 times as compared to the Intel Core i7 processor software runtimes.

WEP

WEP[2] (or Wired Equivalent Privacy, as mentioned previously) was first introduced in 1997, and is still commonly used in homes and businesses to protect wireless access points from intrusion. WEP is significantly less secure than WPA; hence for organizations requiring high levels of security, including government and military, WEP is now being replaced by WPA and other more secure systems. Breaking WEP involves trying every possible key formed from a pseudo-random stream of bits that has been combined (using an exclusive-OR operation) with known plaintext and checksums present in each WEP packet. Statistical approaches can speed up this attack by reducing the size of the key space and the corresponding number of keys to be tested. When cracking WEP, we brought the time needed for brute-forcing the entire 40-bit keyspace from an estimated 42 days on the IntelCore i7 processor down to just 4.7 minutes on our FPGA cluster. This performance comes from generating 3.8 billion keys per second in the FPGA cluster, representing an astonishing 12,933x increase in performance as illustrated in Figure 3.

23
Figure 3: WEP cracking is accelerated 12,933x over the CPU, using an FPGA cluster
(Click graphic to zoom by 1.9x)

The vulnerability of digital walls

FPGA-based methods can be used to crack many schemes that once appeared to be strong. Using a single FPGA cluster equipped with 176 FPGA devices, we recently achieved the highest-known benchmark speeds for 56-bit DES decryption using a single, FPGA-accelerated 4U server, with throughput exceeding 280 billion keys per second.

The parallel DES cracking algorithm in this demonstration used brute-force methods to analyze the entire DES 56-bit keyspace, iteratively decrypting fixed-size blocks of data to find keys that decrypted some portion of the data into sequences of ASCII numbers. This technique is often used for recovering the keys of encrypted files containing known types of data, for example encrypted documents that include financial or military information. DES cracking software running on current-generation CPU cores can process approximately 16 million DES key operations per second. A modern GPU card such as the NVIDIA Tesla can handle more than 10x that number, or approximately 250 million DES operations per second.

When using an FPGA cluster based on a single, off-the-shelf motherboard, however, each FPGA was able to perform 1.6 billion DES operations per second. This means that a key recovery that would take years to perform on a PC, even with GPU acceleration, could be accomplished in less than three days on a 4U FPGA cluster. As additional FPGAs are deployed in ever-larger clusters, the time required to decrypt data decreases in a near-linear manner.

Although DES is now considered obsolete because of its demonstrated vulnerability to attack, there are other encryption methods in common use today that are also at risk. Consider Global System for Mobile Communications (GSM), a standard that carries the overwhelming majority of cellular phone calls worldwide. At the Chaos Communication Congress held in Berlin in December of 2009, cryptographic researcher Karsten Nohl announced that his team, a group of hackers working collaboratively to create a distributed computing cluster, had cracked GSM encryption by creating an enormous, 2 TB “rainbow table” of hash values[3]. In simplistic terms, the rainbow table provides a cracking program with a reverse-lookup scheme that can quickly decrypt the wireless voice data.

During his talk, Nohl stated that a person or group wanting to eavesdrop on GSM calls would currently need to spend around $100,000 on hardware to crack an A5/1 encrypted call in one second or less. This research illustrates the importance of security auditing to expose and correct vulnerabilities in widely used data security standards. FPGA accelerators can play an important role in such auditing.

Cracking the code faster, thanks to FPGAs

“Black hat” and “white hat” hackers alike have discovered the power of FPGA computing for cryptography. As in the earliest years of computing, researchers in other application domains including such diverse areas as genomics, physics, and finance are also realizing the benefits of FPGA-based parallel platforms.

The most computationally intensive parts of code cracking and key recovery algorithms are relatively simple operations that must be performed iteratively, and that can be parallelized for high performance. When deployed on FPGAs, these algorithms can use available FPGA resources with extreme efficiency because of the simple nature of the tasks being performed. Not all software algorithms fit so neatly into FPGA clusters, but many complex computing problems can be attacked in the same way.

References:

  1. “Distributed Password Recovery: How GPU Acceleration Works,” www.elcomsoft.com/distributed_password_recovery.html
  2. “WEP key wireless cracking made easy,” John Leyden, The Register, April 4, 2007, www.theregister.co.uk/2007/04/04/wireless_code_cracking/
  3. “Security fear as mobile phone code is cracked,” Maija Palmer, Financial Times, January 2, 2010, www.ft.com/cms/s/0/ee35fcdc-f70c-11de-9fb5-00144feab49a.html

David Hulton is Director of Security Applications for and an acknowledged expert on FPGA-based code cracking. He is a past chairman of the ToorCon security conference and has spoken at events that include Black Hat, Def con, and many others. He can be contacted at dhulton@picocomputing.com.

David Pellerin, Director of Strategic Marketing for Pico Computing, has more than 25 years of experience with programmable logic. He is the author of four books on FPGA-related topics, most recently “Practical FPGA Programming in C.” He can be contacted at dpellerin@picocomputing.com.

Pico Computing 206-283-2178 www.picocomputing.com