Mark Sheinin, Yoav Y. Schechner and Kiriakos. N. Kutulakos
Computational Imaging on the Electric Grid
Best Student Paper Award CVPR'17.
Night beats with alternating current (AC) illumination. By passively sensing this beat, we reveal new scene information which includes: the type of bulbs in the scene, the phases of the electric grid up to city scale, and the light transport matrix. This information yields unmixing of reflections and semi-reflections, nocturnal high dynamic range, and scene rendering with bulbs not observed during acquisition. The latter is facilitated by a database of bulb response functions for a range of sources, which we collected and provide. To do all this, we built a novel coded exposure high-dynamic-range imaging technique, specifically designed to operate on the grid’s AC lighting.
Hadar Averbuch-Elor, Daniel Cohen-Or, Johannes Kopf, Michael F. Cohen
Bringing Portraits to Life
We present a technique to automatically animate a still portrait, making it possible for the subject in the photo to come to life and express various emotions. We use a driving video (of a different subject) and develop means to transfer the expressiveness of the subject in the driving video to the target portrait. In contrast to previous work that requires an input video of the target face to reenact a facial performance, our technique uses only a single target image. We animate the target image through 2D warps that imitate the facial transformations in the driving video. As warps alone do not carry the full expressiveness of the face, we add fine-scale dynamic details which are commonly associated with facial expressions such as creases and wrinkles. Furthermore, we hallucinate regions that are hidden in the input target face, most notably in the inner mouth. Our technique gives rise to reactive profiles, where people in still images can automatically interact with their viewers. We demonstrate our technique operating on numerous still portraits from the internet.
A. Gabbay, A. Ephrat, A. Shamir, T. Halperin, S. Peleg
Seeing Through Noise: Visual Speech Enhancement
Visual speech enhancement can be used in videos shot in noisy environments, when the speaker is also visible in the video. The voice of the visible speaker is enhanced by removing sounds that do not correspond to the visible mouth movements. An audio-visual neural network model is presented, generating an improved sound by separating the corresponding speech from the background noise. Visual speech enhancement methods, which avoid the use of all possible noise combinations during training, will be presented.
Tali Dekel, Michael Rubinstein , Ce Liu, and William T Freeman
Securing Visible Watermarks
Whether you are a photographer, a marketing manager, or a regular Internet user, chances are you have encountered visible watermarks many times. Visible watermarks are those semi-transparent logos often overlaid on digital images and are most common mechanism for protecting the copyrights of hundreds of millions of photographs and stock images that are offered online daily. Yet, it suffers from an inherent security flaw--watermarks are typically added in a consistent manner to many images. We show that this consistency allows to get past this protection and remove watermarks automatically, giving users unobstructed access to the clean images the watermarks are intended to protect. Specifically, we present a generalized multi-image matting algorithm that takes a watermarked image collection as input and automatically estimates the "foreground" (watermark), its alpha matte, and the "background" (original) images. Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured. I'll show how watermarks can be removed from real stock companies imagery and how they ended up deploying our protection. Finally, I'll discuss future applications of our algorithm within Google.
Roey Mechrez, Eli Shechtman and Lihi Zelnik-Manor
Photorealistic image synthesis and manipulation
Recent work has shown impressive success in automatically synthesizing new images with desired properties such as transferring painterly style, modifying facial expressions or manipulating the center of attention of the image. These approaches, however, fall short of producing photorealistic outputs and often exhibit distortions reminiscent of a painting or usage of unrealistic colors. In this talk we will describe our efforts in making the process of image synthesis and manipulation more realistic. Our approach takes as input a stylized image and makes it more photorealistic. It relies on the Screened Poisson Equation, maintaining the fidelity of the stylized image while constraining the gradients to those of the original input image. Our method is efficient, fast and simple. We demonstrate it on two different image manipulation tasks: manipulating images in order to control their saliency maps and photorealistic style transfer. We show that our method can be used in a post-processing manner after the image manipulation or embedded into an iterative patch-based synthesis scheme. Comparing our method to previous ones shows significant improvement in the realistic appearance of the resulting images.
Roy Jevnisek & Shai Avidan
Co-Occurrence Filter
Co-occurrence Filter (CoF) is a boundary preserving filter. It is based on the Bilateral Filter (BF) but instead of using a Gaussian on the range values to preserve edges it relies on a co-occurrence matrix. Pixel values that co-occur frequently in the image (i.e., inside textured regions) will have a high weight in the co-occurrence matrix. This, in turn, means that such pixel pairs will be averaged and hence smoothed, regardless of their intensity differences. On the other hand, pixel values that rarely co-occur (i.e., across texture boundaries) will have a low weight in the co-occurrence matrix. As a result, they will not be averaged and the boundary between them will be preserved. The CoF therefore extends the BF to deal with boundaries, not just edges. It learns co-occurrences directly from the image. We can achieve various filtering results by directing it to learn the co-occurrence matrix from a part of the image, or a different image. We give the definition of the filter, discuss how to use it with color images and show several use cases.
Yuval Bahat, Netalee Efrat, Michal Irani
Non-Uniform Blind Deblurring by Reblurring
We present an approach for blind image deblurring, which handles non-uniform blurs. Our algorithm has two main components: (i) A new method for recovering the unknown blur-field directly from the blurry image, and (ii) A method for deblurring the image given the recovered non-uniform blur-field. Our blur-field estimation is based on analyzing the spectral content of blurry image patches by Re-blurring them. Being unrestricted by any training data, it can handle a large variety of blur sizes, yielding superior blur-field estimation results compared to training based deep-learning methods. Our non-uniform deblurring algorithm is based on the internal image-specific patch recurrence prior. It attempts to recover a sharp image which, on one hand – results in the blurry image under our estimated blur-field, and on the other hand – maximizes the internal recurrence of patches within and across scales of the recovered sharp image. The combination of these two components gives rise to a blind-deblurring algorithm, which exceeds the performance of state-of-the-art CNN-based blind-deblurring by a significant margin, without the need for any training data.
D. Kaufman, G. Levi, T. Hassner, L. Wolf:
Temporal Tessellation: A Unified Approach for Video Analysis
We present a general approach to video understanding, inspired by semantic transfer techniques that have been successfully used for 2D image analysis. Our method considers a video to be a 1D sequence of clips, each one associated with its own semantics. The nature of these semantics – natural language captions or other labels – depends on the task at hand. A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video. We describe two matching methods, both designed to ensure that (a) reference clips appear similar to test clips and (b), taken together, the semantics of the selected reference clips is consistent and maintains temporal coherence. We use our method for video captioning on the LSMDC’16 benchmark, video summarization on the SumMe and TVSum benchmarks, Temporal Action Detection on the Thumos2014 benchmark, and sound prediction on the Greatest Hits benchmark. Our method not only surpasses the state of the art, in four out of five benchmarks, but importantly, it is the only single method we know of that was successfully applied to such a diverse range of tasks.
Ehud Barnea, Ohad Ben-Shahar
On the Utility of Context (or the Lack Thereof) for Object Detection
The recurring context in which objects appear holds valuable information that can be employed to predict their existence. This intuitive observation indeed led many researchers to endow appearance-based detectors with explicit reasoning about context. The underlying thesis suggests that with stronger contextual relations, the better improvement in detection capacity one can expect from such a combined approach. In practice, however, the observed improvement in many case is modest at best, and often only marginal. In this work we seek to understand this phenomenon better, in part by pursuing an opposite approach. Instead of going from context to detection score, we formulate the score as a function of standard detector results and contextual relations, an approach that allows to treat the utility of context as an optimization problem in order to obtain the largest gain possible from considering context in the first place.
Analyzing different contextual relations reveals the most helpful ones and shows that in many cases including context can help while in other cases a significant improvement is simply impossible or impractical. To better understand these results we then analyze the ability of context to handle different types of false detections, revealing that contextual information
cannot ameliorate localization errors, which in turn also diminish the observed improvement obtained by correcting other types of errors. These insights provide further explanations and better understanding regarding the success or failure of utilizing context for object detection.
Gautam Pai and Ron Kimmel
Learning Invariant Representations of Planar Curves
We propose a metric learning framework for the construction of invariant geometric functions of planar curves for the Euclidean and Similarity group of transformations. We leverage on the representational power of convolutional neural networks to compute these geometric quantities. In comparison with axiomatic constructions, we show that the invariants approximated by the learning architectures have better numerical qualities such as robustness to noise, resiliency to sampling, as well as the ability to adapt to occlusion and partiality. Finally, we develop a novel multi-scale representation in a similarity metric learning paradigm.
Oran Shayer, Dan Levi and Ethan Fetaya
Learning Discrete Weights Using the Local Reparameterization Trick
Recent breakthroughs in computer vision make use of large deep neural networks, utilizing the substantial speedup offered by GPUs. For applications running on limited hardware, however, high precision real-time processing can still be a chal- lenge. One approach to solving this problem is training networks with binary or ternary weights, thus removing the need to calculate multiplications and signif- icantly reducing memory size. In this work, we introduce LR-nets (Local repa- rameterization networks), a new method for training neural networks with dis- crete weights using stochastic parameters. We show how a simple modification to the local reparameterization trick, previously used to train Gaussian distributed weights, enables the training of discrete weights. Using the proposed training we test both binary and ternary models on MNIST, CIFAR-10 and ImageNet bench- marks and reach state-of-the-art results on most experiments.
Yehuda Dar, Alfred M. Bruckstein, and Michael Elad
Adapting Standard Image Compression to Compensate Post-Decompression Degradation
Many imaging systems implement a fundamental design where an image is transmitted or stored and eventually presented to a human observer using an imperfect display device. While the eventual quality of the output image may be severely affected by the display, this degradation is usually ignored in the preceding compression stage. In this work we model the sub-optimality of the display device as a known degradation operator applied on the decompressed image. We assume the use of a standard compression path, and augment it with a suitable pre-processing procedure, providing a compressed image intended to compensate the degradation without any post-filtering. Our approach originates from an intricate rate-distortion optimization, optimizing the modifications to the input image so as to lead to an end-to-end best performance. We address this computationally intractable problem using the alternating direction method of multipliers (ADMM) approach, leading to a procedure in which a standard compression technique is applied iteratively. We demonstrate the proposed method for adjusting HEVC image compression to post-decompression blur. The experiments establish our method as a leading approach for preprocessing high bit-rate compression to counterbalance a post-decompression degradation.
Elad Osherov, Michael Lindenbaum
Increasing CNN Robustness to Occlusions by Reducing Filter Support
Convolutional neural networks (CNNs) provide the current state of the art in visual object classification, but they are far less accurate when classifying partially occluded objects. A straightforward way to improve classification under occlusion conditions is to train the classifier using partially occluded object examples. However, training the network on many combinations of object instances and occlusions may be computationally expensive. This work proposes an alternative approach to increasing the robustness of CNNs to occlusion. We start by studying the effect of partial occlusions on the trained CNN and show, empirically, that training on partially occluded examples reduces the spatial support of the filters. Building upon this finding, we argue that smaller filter support is beneficial for occlusion robustness. We propose a training process that uses a special regularization term that acts to shrink the spatial support of the filters. We consider three possible regularization terms that are based on second central moments, group sparsity, and mutually reweighted L1, respectively. When trained on normal (unoccluded) examples, the resulting classifier is highly robust to occlusions. For large training sets and limited training time, the proposed classifier is even more accurate than standard classifiers trained on occluded object examples.
Assaf Shocher, Nadav Cohen, Michal Irani
“Zero-Shot” Super-Resolution using Deep Internal Learning
Deep Learning has led to a dramatic leap in Super-Resolution (SR) performance in the past few years. However, being supervised, these SR methods are restricted to specific training data, where the acquisition of the low-resolution (LR) images from their high-resolution (HR) counterparts is predetermined (e.g., bicubic downscaling), without any distracting artifacts (e.g., sensor noise, image compression, non-ideal PSF, etc). Real LR images, however, rarely obey these restrictions, resulting in poor SR results by SotA (State of the Art) methods. In this paper we introduce “Zero-Shot” SR, which exploits the power of Deep Learning, but does not rely on prior training. We exploit the internal recurrence of information inside a single image, and train a small image-specific CNN at test time, on examples extracted solely from the input image itself. As such, it can adapt itself to different settings per image. This allows to perform SR of real old photos, noisy images, biological data, and other images where the acquisition process is unknown or non-ideal. On such images, our method outperforms SotA CNN-based SR methods, as well as previous unsupervised SR methods. To the best of our knowledge, this is the first unsupervised CNN-based SR method.
Zorah Lähner, Matthias Vestner, Amit Boyarski, Or Litany, Ron Slossberg, Tal Remez, Emanuele Rodolà, Alex Bronstein,Michael Bronstein, Ron Kimmel, Daniel Cremers
Efficient Deformable Shape Correspondence via Kernel Matching
We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the mapping, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of the highly non-convex nature of the resulting quadratic assignment problem, our method converges to a semantically meaningful and continuous mapping in most of our experiments, and scales well. We provide preliminary theoretical analysis and several interpretations of the method.