Flattened one-bit stochastic gradient descent: compressed distributed optimization with controlled variance

Abstract: We propose a novel algorithm for distributed stochastic gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit stochastic gradient descent (FO-SGD), relies on two simple algorithmic ideas: (i) a one-bit quantization procedure leveraging the technique of dithering, and (ii) a randomized fast Walsh-Hadamard transform to flatten the stochastic gradient before quantization.