Input images are processed in forward order (top stream) and backward order (bottom stream) using local and context Siamese networks, yielding per-pixel descriptors. We then match points on a regular grid in the reference image to every pixel in the other image. Matching costs are smoothed using discrete MAP inference in a pairwise Markov random field. A forward-backward consistency check removes outliers.
Motivated by the success of deep learning techniques in matching problems, we present a method for learning context-aware features for solving optical flow using discrete optimization [ ]. Towards this goal, we present an efficient way of training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. We perform feature matching by comparing each pixel in the reference image to every pixel in the target image, utilizing fast GPU matrix multiplication. The matching cost volume from the network’s output forms the data term for discrete MAP inference in a pairwise Markov random field. We provide an extensive empirical investigation of network architectures and model parameters. At the time of submission, our method ranks second on the challenging MPI Sintel test set.