I am a research assistant at Columbia University’s DVMM lab, led by Prof. Shih-Fu Chang. I
work on multimodal learning, under the supervision of Dr. Mingyang Zhou.
I recently graduated from Columbia as an MS student in EE. At Columbia, I worked on robust computer
vision models, under the supervision of Prof. Junfeng
Yang and Prof. Carl
Vondrick, and worked closely with PhD student Chengzhi
Prior to Columbia, I earned my Bachelor's degree at Nanjing University, China, where I did my
undergraduate thesis on neural image compression, advised by Prof. Qiu Shen.
I am broadly interested in machine perception and understanding neural networks. In particular,
improving the robustness and generalization of machine learning models, understanding the behavior
of deep networks, and explaining and advancing them through information-principled lenses.
Adversarially Robust Video Perception by Seeing Motion Lingyu Zhang*,
In submission arxiv /
We find that adversarial attacks generated for fooling video classifiers also collaterally corrupt
motion. We propose to defend against attacks at test time by restoring disrupted motion.
Robust Perception through Equivariance
Abhishek Vaibhav Joshi,
In submission arxiv /
We introduce a framework that uses the dense intrinsic constraints in natural images to robustify
inference, allowing the model to adjust dynamically to each
individual image's unique and potentially novel characteristics at inference time.
A Stereo Matching Method for Three-Dimensional Eye Localization of Autostereoscopic
International Conference on Image and Graphics, 2021
We improve and optimize the ZNCC stereo matching algorithm for three-dimensional eye localization.
We improve operation logic of the matching and optimize the scanning strategy based on the
Entropy Constrained Information Bottleneck
Lingyu Zhang E6876 Final Project, 2022
We propose to use deterministic encoding along with actual quantization
on latents, rendering the IB problem a source compression. By doing so, finite non-trivial
information can be estimated.
Black-box Adversarial Attacks with Style Information
Lingyu Zhang E6691 Final Project, 2022
We propose two types of blackbox attacks based on style transfer and
investigate how robust classifiers behave against them.
Unsupervised Harmonic Sound Source Separation with Spectral Clustering
CS4774 Final Project, 2021
We modeled mixed sources of audio signals by sinusoidal modeling with Short-Time Fourier
Transforms. Based on selected spectral peaks of sinusoidal parameters, we constructed a
similarity function between time and frequency components, and applied spectral clustering
globally partition the data.
Exploring Diverse Ways To Improve An Agent On
Active Object Localization With Deep Reinforcement Learning
E6885 Final Project, 2021
We proposed improvement to using DQNs for Object Detection from four aspects,
including using advanced CNNs to generate state representation, defining more flexible
action spaces, changing reward function to avoid undesired activity in agent
and using mask instead cross for multiple objects.
Design and Optimization of a Multi-scale Representation based Image Compression
Lingyu Zhang Undergraduate Thesis, 2021
Learned image compression has surpassed the rate-distortion performance of hand-crafted
traditional image codecs in recent years. However, they are not yet practical because of
significantly slower decoding speed than classical algorithms. We investigated the
of directly performing vision tasks in the latent space and found that using a
multi-scale encoder helped preserve semantic meaning in latent codes while maintaining
state-of-the-art compression rates
Dynamic Disparity Range Semi-Global Matching for Video Stereo Matching
Lingyu Zhang, Computer Vision Final Project, 2020
Implemented an accelerated stereo matching algorithm for video sequences, utilizing a
disparity range search based on temporal correlation between frames, saving 21% of
time with minimal accuracy loss. Designed a Divided Section cost function, preserving more
information than Census cost, achieving 18% better matching accuracy while trading off