Parallel Programming and Raspberry Pi Clusters

Read time 4 minutes

Published On Apr 13, 2018

“Many shall run to and fro, and knowledge shall be increased.” (Daniel 12:4)

“Two are better than one; because they have a good reward for their labor.” (Ecclesiastes 4:9)

The Raspberry Pi miniature computer has sparked the imagination of aspiring computer science professionals and Internet of Things hobbyists since its first introduction in February 2012. Its myriad of uses ranging from an inexpensive computing platform to a robot power source are well documented on thousands on websites. At GCU, a computer science research team is tackling a particular area of computing: the clustering of multiple Raspberry Pi’s in order to build a parallel computer. The idea itself is not new, but the (successful) endeavor epitomizes the type of activities computer science majors pursue.

How powerful is the Raspberry Pi? How much faster (if at all) is it compared to the powerhouses of the dawn of the parallel computing era?

Supercomputers of the Past

In 1976, Seymour Cray introduced the fastest supercomputer the world had seen to date. It packed an amazing 8 MB RAM, revolutionizing the field of parallel and vectored computing with a performance of 160 MFLOPS – Mega (i.e. million) Floating-Point operations per second. Its cost was $5 million (equivalent to $20 million today).

It was succeeded in 1985 by the Cray-XMP parallel computer, capable of 800 MFLOPS. When President Ronald Reagan boasted the “Star Wars Initiative,” when scientists were pursuing weather modeling, stock market prediction and defense research, computations were conducted on a $15 million Cray ($33 million in today’s money), boasting peak performance of 117 MHz processor and 942 MFLOPS.

The Raspberry Pi Cluster

Roughly 30 years later, the Raspberry Pi 3 sits humbly on my desk with its 1200 MHz processor (with 4 cores) and a range of 200 – 1500 MFLOPS peak performance for such tasks as image compression or large vector calculations. Up to two times more powerful than the CRAY-XMP, it costs only $35.But what if we can connect say, 20 of them, and build a parallel computer?

Part of the S.M.U.R.F research team lead by Dr. Isac Artzi, two junior computer science students, Jacob Slaton and Kona Wunsch, architected just that. With support from fellow teammates Ryan Fitzgerald and Frank Leyva, Jacob and Kona created a Beowulf cluster, with 20 CPUs (80 cores) and 20 GB RAM. Stay tuned for amazing performance benchmarks in a follow-up blog.

The Beowulf Cluster

The Beowulf cluster architecture enables multiple computers to connect via a switch with the familiar TCP-IP network messaging protocol and act as one computer. Tasks like multiplying 100,000 pairs of numbers, for example, can be divided across the cluster, with each one of the FPUs (floating point processors) performing roughly 5,000 operations, thus (theoretically) slashing the computation time 20 fold. This high level estimate of computational benefits is a good approximation, but only adequate for casual conversations. Computer science professionals use more accurate methodologies to calculate the expected performance of complex computer architectures – especially parallel and clustered ones.

In theory, each addition of a processor should half the computational time needed to perform a task. In reality, computational overhead, processor communication overhead, data preparation overhead and other technical factors limit the performance gain as the number of processors increases.

Amdahl’s Law

In addition, not all computational tasks lend themselves to efficient parallelization. Already in 1967, Gene Amdahl presented a calculation of the speedup in latency, now known in computer science as Amdahl’s Law. Amdahl’s Law takes into consideration the computational time of those portions of a task that cannot be parallelized. For example, a computer is tasked with processing 1,000 data items; a naïve assumption might lead one to believe that 10 processors can process 100 data items each, thus reducing the task computing time by 90%. However, the subtask of initially reading the data cannot always be parallelized. The subtask of assembling and displaying the results of the computation might not be suitable for parallelization either. Consequently, only a fraction (albeit a large one) of the task can be distributed across multiple processors.

When taking into account the portions of the task that benefit from parallelization and those that do not, Amdahl’s Law proposes the following terminology and relationship:

S_latency: the theoretical speedup of the execution of the whole task
s: the speedup of the part of the task that benefits from increased number of processors and memory
p: the proportion of execution time of the whole task concerning the parallelizable part of the task before parallelization.

Given the above terminology, Amdahl’s Law defines the following relationship:

Since S_latency 1/(1 – p), it shows that a small part of the program which cannot be parallelized will limit the overall speedup available from parallelization.

Computer science is a fascinating field. As a computer science professional, one has the opportunities to invent, discover, make history and affect the daily lives of billions. All that is needed is an appreciation for the beauty of mathematics, the perseverance to spend days and nights solving difficult problems and a keen affinity for the scientific mindset. Will you join us in building the next Raspberry Pi cluster with 256 CPU? What about one with 8 GPU (graphic cards)? Would you like to multiply 1,000,000 numbers in microseconds? How difficult is to search and detect a face in a crowd of 100,000 fans attending a football game? What should be the initial trajectory of a rocket bound to intercept an asteroid? Can your self-driving car distinguish between two motorcycles and an incoming truck at night before attempting to drive through?

Computer science education can provide you with the knowledge, tools and experience to collaborate with scientists and engineers who tackle all of these fascinating problems.

To learn more about opportunities available with Grand Canyon University’s College of Science, Engineering and Technology, visit our website or click the Request More Information button on this page.

Written By

By Isac Artzi, PhD
Faculty, College of Science, Engineering and Technology