Skip to main content
U.S. flag

An official website of the United States government

This site is currently in beta, and your feedback is helping shape its ongoing development.

Mining Distance-Based Outliers in Near Linear Time

Published by Dashlink | National Aeronautics and Space Administration | Metadata Last Checked: August 04, 2025 | Last Modified: 2025-03-31
Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

Find Related Datasets

Click any tag below to search for similar datasets

data.gov

An official website of the GSA's Technology Transformation Services

Looking for U.S. government information and services?
Visit USA.gov