A k-nearest neighbor method, which has been widely applied in machine learning, is a useful tool to obtain statistical inference for an underlying distribution of multi-dimensional data. However, the knowledge on choosing an optimal order for the k-nearest neighbor is relatively little. This paper proposes an asymptotic distribution for the nearest neighbor statistic. Under some conditions, we find an optimal unbiased density estimate based on a linear combination of nearest neighbors, and it leads to an optimal choice for the order of the k-nearest neighbor.
Date:
2012-10
Relation:
Statistics and Probability Letters. 2012 Oct;82(10):1786-1791.