HPC MSU

Publication Abstract

A Look at Efficiently Predicting Variable Rankings by Relative Importance

Pape, P. R., Ivancic, C., & Hamilton, J. (2016). A Look at Efficiently Predicting Variable Rankings by Relative Importance. National Cyber Summit '16. Huntsville, AL.

Abstract

This paper describes a work-in-progress software framework for identifying the highest priority variables in a software sample, based on a relative importance metric. The framework utilizes a combination of static and dynamic analysis to gather features pertaining to each variable in the relevant functions of a software sample and then makes a prediction as to the priority ranking of each variable in that sample. This ranking is based on the likelihood of the variable to cause a fail state in the software when dealing with a faulty or unexpected data value and the magnitude of the failure. The magnitude of the failure is determined by how far reaching the impact of the data fault is and how long the fault persists in the software sample. An initial experiment is presented where two open-source software samples are used to get some initial data on the effectiveness of the framework. The samples are used in two ways: a training/test method and a cross-validation method. These two methods are used to test the learning algorithms used in the experiment against unseen data and against data that is familiar, respectively. The data indicates a strong potential to this line of research and once the framework is automated, a much larger sample size will be collected and evaluated. The key goal of the research at this stage is to determine if the features extracted from the software sample can be used to accurately predict the trend of the rankings of the variables in the top ten to thirty percent within a reasonable range. To reduce the time needed to bring an open-source software component to an acceptable range of reliability and security, only the most important variables are indicated for follow-up with error handling and recovery techniques. Any variables that fall below the ranking threshold, usually the top ten to twenty percent, have too low of an importance ranking to cause a lasting, long-reaching failure.