Simple Statistics

12 Nov

The field statistics is MUCH bigger than most people realize. However most of the time people are looking for the basics: min, max, average (mean). These values are easily computed by any novice programmer with a loop and some simple math… but why waste your time when someone else has already done the work?

Commons-Math3 to the Rescue

The fine folks over at Apache have created the commons-math3 library. (Why the funny name? Because they created version 3 and wanted to move it’s location in Maven from commons-math:commons-math to the more standard org.apache:commons-math3) It is probably the most extensive of the commons libraries. (Personal note: I think commons-math3 should be moved to a top-level project simply because of how thorough it is!)
 
There are a handful of classes that relate to simple statistics that everyone should be familiar with: StatUtilsSummaryStatisticsDescriptiveStatistics, and Frequency. Below is an example of using these classes.

Beyond using the stats classes on commons-math3, there are a few other “tricks” going here including: FastMath (always use this class over Math), Guava to convert from int[] to double[], and the good old String class to format our results.

SummaryStatistics Vs. DescriptiveStatistics

At first glance these two classes might look like they do the same things, but their implementations are VERY different. SummaryStatistics does NOT store the individual values whereas DescriptiveStatistics does. This is important for two reasons: memory efficiency and types of statistics you can generate. Because DescriptiveStatistics stores all of the values it uses more memory than SummaryStatistics does. Also DescriptiveStatistics can provide the following stats the SummaryStatistics cannot: percentiles, skewness, kurtosis, median.

Thread-safe Versions

If you’re working with multiple threads pushing data into either SummaryStatistics or DescriptiveStatistics you will need to use the synchronized versions: SynchronizedSummaryStatistics and SynchronizedDescriptiveStatistics.

Conclusion

commons-math3 is packed with VERY useful statistics classes beyond the ones described here. Their documentation is also GREAT. Poke around and explore, there is a ton of stuff there. As always you can find the source for this post on GitHub.

 

If you have any questions, please post them in the comments. If you find and fix any bugs or have improvements, please fork and make a pull request.

Thanks for reading…

 
 
 
 
 
 

Leave a Reply

Your email address will not be published. Required fields are marked *