A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks

Published by Dashlink | National Aeronautics and Space Administration | Metadata Last Checked: August 04, 2025 | Last Modified: 2025-03-31

This paper describes a local and distributed expectation maximization algorithm for learning parameters of Gaussian mixture models (GMM) in large peer-to-peer (P2P) environments. The algorithm can be used for a variety of well-known data mining tasks in distributed environments such as clustering, anomaly detection, target tracking, and density estimation to name a few, necessary for many emerging P2P applications in bioinformatics, webmining and sensor networks. Centralizing all or some of the data to build global models is impractical in such P2P environments because of the large number of data sources, the asynchronous nature of the P2P networks, and dynamic nature of the data/network. The proposed algorithm takes a two-step approach. In the monitoring phase, the algorithm checks if the model ‘quality’ is acceptable by using an efficient local algorithm. This is then used as a feedback loop to sample data from the network and rebuild the GMM when it is outdated. We present thorough experimental results to verify our theoretical claims.

Find Related Datasets

Click any tag below to search for similar datasets

Complete Metadata

@type	dcat:Dataset
accessLevel	public
accrualPeriodicity	irregular
bureauCode	[ "026:00" ]
contactPoint	{ "fn": "Kanishka Bhaduri", "@type": "vcard:Contact", "hasEmail": "mailto:kanishka.bhaduri-1@nasa.gov" }
description	This paper describes a local and distributed expectation maximization algorithm for learning parameters of Gaussian mixture models (GMM) in large peer-to-peer (P2P) environments. The algorithm can be used for a variety of well-known data mining tasks in distributed environments such as clustering, anomaly detection, target tracking, and density estimation to name a few, necessary for many emerging P2P applications in bioinformatics, webmining and sensor networks. Centralizing all or some of the data to build global models is impractical in such P2P environments because of the large number of data sources, the asynchronous nature of the P2P networks, and dynamic nature of the data/network. The proposed algorithm takes a two-step approach. In the monitoring phase, the algorithm checks if the model ‘quality’ is acceptable by using an efficient local algorithm. This is then used as a feedback loop to sample data from the network and rebuild the GMM when it is outdated. We present thorough experimental results to verify our theoretical claims.
distribution	[ { "@type": "dcat:Distribution", "title": "P2P_EM.pdf", "format": "PDF", "mediaType": "application/pdf", "description": "P2P_EM.pdf", "downloadURL": "https://c3.nasa.gov/dashlink/static/media/publication/P2P_EM_2.pdf" } ]
identifier	DASHLINK_258
issued	2010-11-17
keyword	[ "ames", "dashlink", "nasa" ]
landingPage	https://c3.nasa.gov/dashlink/resources/258/
modified	2025-03-31
programCode	[ "026:029" ]
publisher	{ "name": "Dashlink", "@type": "org:Organization" }
title	A Local Scalable Distributed Expectation Maximization Algorithm for Large Peer-to-Peer Networks

1 resource available