# Sign-Full Random Projections

@article{Li2019SignFullRP, title={Sign-Full Random Projections}, author={Ping Li}, journal={ArXiv}, year={2019}, volume={abs/1805.00533} }

The method of 1-bit (“sign-sign”) random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v ∈ ℝD, one can generate x = ∑i=1D uiri, and y = ∑i=1D viri, where ri ∼ N(0, 1) iid. Then one can estimate the cosine similarity ρ from sgn(x) and sgn(y). In this paper, we study a series of estimators for “sign-full” random projections. First we prove E(sgn(x)y) = √2/πρ, which provides an estimator for ρ. Interestingly this… Expand

#### Figures, Tables, and Topics from this paper

#### 3 Citations

Random Projections with Asymmetric Quantization

- Computer Science
- NeurIPS
- 2019

This paper thoroughly analyzes the biases and variances of a series of estimators including the basic simple estimators, their normalized versions, and their debiased versions and shows that the expectation of proposed estimators increases with the true cosine similarity, on a broader family of stair-shaped quantizers. Expand

Improving Random Projections With Extra Vectors to Approximate Inner Products

- Computer Science, Mathematics
- IEEE Access
- 2020

This research shows how the variance of estimated inner products is reduced as more vectors are added, how variance reduction is related to the geometry of the dataset and moreover, the asymptotic behaviour of the variance as the number of extra vectors added goes to infinity. Expand

Breaking the waves: asymmetric random periodic features for low-bitrate kernel machines

- Computer Science, Engineering
- ArXiv
- 2020

The take-home message is that when the features of only one of the two signals are quantized, the original kernel is recovered without distortion; its practical interest appears in several cases where the kernel evaluations are asymmetric by nature, such as a client-server scheme. Expand

#### References

SHOWING 1-10 OF 15 REFERENCES

Similarity estimation techniques from rounding algorithms

- Mathematics, Computer Science
- STOC '02
- 2002

It is shown that rounding algorithms for LPs and SDPs used in the context of approximation algorithms can be viewed as locality sensitive hashing schemes for several interesting collections of objects. Expand

Locality-sensitive hashing scheme based on p-stable distributions

- Computer Science, Mathematics
- SCG '04
- 2004

A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1. Expand

Improved Approximation Algorithms for Maximum Graph Partitioning Problems

- Mathematics, Computer Science
- J. Comb. Optim.
- 2005

It is proved that a sub-optimal choice of these parameters can be obtained by the solution of a linear program which leads to the desired improvement of the approximation factors. Expand

Finding near-duplicate web pages: a large-scale evaluation of algorithms

- Computer Science
- SIGIR
- 2006

A combined algorithm is presented which achieves precision 0.79 with 79% of the recall of the other algorithms, and since Charikar's algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall than Broder et al.'s algorithm. Expand

Improving Random Projections Using Marginal Information

- Mathematics, Computer Science
- COLT
- 2006

We present an improved version of random projections that takes advantage of marginal norms. Using a maximum likelihood estimator (MLE), margin-constrained random projections can improve estimation… Expand

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

- Mathematics, Computer Science
- JACM
- 1995

This algorithm gives the first substantial progress in approximating MAX CUT in nearly twenty years, and represents the first use of semidefinite programming in the design of approximation algorithms. Expand

Detecting near-duplicates for web crawling

- Computer Science
- WWW '07
- 2007

This work demonstrates that Charikar's fingerprinting technique is appropriate for near-duplicate detection and presents an algorithmic technique for identifying existing f-bit fingerprints that differ from a given fingerprint in at most k bit-positions, for small k. Expand

An Introduction to Multivariate Statistical Analysis

- Mathematics
- 1959

Preface to the Third Edition.Preface to the Second Edition.Preface to the First Edition.1. Introduction.2. The Multivariate Normal Distribution.3. Estimation of the Mean Vector and the Covariance… Expand

Learning mixtures of Gaussians

- Mathematics, Computer Science
- 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039)
- 1999

This work presents the first provably correct algorithm for learning a mixture of Gaussians, which returns the true centers of the Gaussian to within the precision specified by the user with high probability. Expand

Latent semantic indexing: a probabilistic analysis

- Computer Science
- PODS '98
- 1998

It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. Expand