ARCHIVED PAGE - This page is archived and being provided for historical reference.
About the ERC Learning Service Research Search MSU ERC Mississippi State University










Cluster Computing at the MSU Engineering Research Center

Many people think that cluster computing originated with Thomas Sterling and Donald Becker's work on the Beowulf Project in 1994. This project is certainly one of the most important events in the history of cluster computing. Its use of the Linux operating system on inexpensive PC's has revolutionized the high performance computing community and created all whole new class of systems (known as Beowulf clusters). However, it was not the first time that a cluster, or what had often previously been referred to as a "multicomputer", had been built. Many others, including Mississippi State University, had been working on the subject for several years before that event. This is a brief description of the history of cluster computing research at Mississippi State University.

Mississippi State University has been involved in what is now called cluster computing at least since 1987. In that year, DARPA funded an MSU project called MADEM (Mapped Array Differential Equation Machine). MADEM was a distributed memory MIMD system based on the Sun 4/110 workstation.

By 1992, research had moved to an 8 node system based on SPARCstation 2 workstations interconnected with communications cards developed by MSU researchers. Included in this system were custom built performance monitoring capabilities as well as a midplane with wormhole router chips. This system was known as the MSPARC/8, which indicated that it was the second generation of the MADEM system, was now based on the new Sun SPARC architecture, and that it had 8 processors. As with the MADEM system, the MSPARC/8 had motherboards that were removed from their original chassis and mounted in a custom chassis with direct interconnects to the midplane.

In June of 1993, the first components were purchased for what would be known as the SuperMSPARC. This was the third generation of this project. The SuperMSPARC is comprised of 8 Sun SPARCstation 10 workstations. Each node has 4 90MHz HyperSPARC processor modules, and 288 MB of RAM. Sun had originally intended to release a quad processor SuperSPARC-based SPARCstation 10, but eventually released them as HyperSPARCs instead due to heat issues. Unfortunately, the project was already named SuperMSPARC by that time. The nodes have been interconnected via the built-in 10Mb/s ethernet, 155Mb/s (OC3) ATM, and Myrinet. The system also has a custom-built midplane and SBUS cards used for monitoring interprocess communications. Unlike its predecessors, the SuperMSPARC systems were left in their original chassis and connected via cables from their SBUS ports to the custom midplane. This project has been so successful, that as of June 2002, nine years after its construction began, it is still in service as a tool to teach parallel computing techniques.

In December 1999, the fourth generation of this project began. The UltraMSPARC is a 16 node system. Each node has 4 400 MHz UltraSPARC II processors and 2 GB of RAM. The nodes are connected via Myrinet as well as 100Mb/s ethernet. The research continues with this system by using custom built Global Positioning System (GPS) cards in the nodes to synchronize their system clocks very accurately with similar systems in a remote location. MSU is now experimenting with clustering techniques where the physical location of the nodes no longer matters. Unlike previous generations, the UltraMSPARC was designed from the outset to be primarily a production level system. The clustering research on this system is secondary to its main function as a center-wide computational resource.

It was due to the experience gained through more than a decade of cluster computing that the MSU Engineering Research Center embarked on the large-scale production system that became known as EMPIRE (ERC's Massively Parallel Initiative for Research and Engineering). EMPIRE is currently a 1038 processor (519 node) cluster based on Intel Pentium III processors running the Linux operating system. Each node contains dual Pentium III processors running at either 1GHz or 1.266GHz, and 1GB of RAM. It is the first cluster built at the ERC based on the Intel/Linux architecture instead of the Sun/SunOS/Solaris architecture. EMPIRE is built primarily with IBM eSeries x330 rackmountable systems connected via 100Mb/s ethernet with interswitch communications via Gigabit ethernet.