Previously, I graduated from Columbia with an MS EE degree. At Columbia, I worked on robust
computer vision models with Prof. Junfeng
Yang and Prof. Carl
Vondrick, and worked closely with PhD student Chengzhi
Mao. I was also a research assistant in Prof. Shih-Fu Chang's DVMM lab, where I worked on
multimodal learning, under the supervision of Dr. Mingyang Zhou.
Prior to Columbia, I earned my Bachelor's degree at Nanjing University, China, where I did my
undergraduate thesis on neural image compression, advised by Prof. Qiu Shen.
[11/2024] Our paper CREW is accepted to TMLR 11/2024
[9/2024] Our paper GUIDE is accepted at Neurips 2024
[9/2024] Our paper HUMAC is released on Arxiv
[8/2024] Our platform for Human-AI teaming CREW is released
Research
I am broadly interested in machine learning for decision making and perception. In particular,
advancing the real-world agents by improving the robustness and generalization of ML models.
GUIDE: Real-Time Human-Shaped Agents
Lingyu Zhang,
Zhengran Ji, Nicholas R Waytowich, Boyuan Chen
Neurips 2024 paper /
project page /
video
Real-time human-guided RL with dense continuous rewards and a learned feedback
model to reduce human input and enable continual training. We also provide insights on what makes a
good human trainer for agents.
We introduce a platform for Human-AI teaming research. CREW offers extensible environment design,
enables real-time
human-AI communication, supports hybrid Human-AI teaming, parallel sessions, multimodal feedback,
and physiological data collection, and features ML community-friendly algorithm design.
HUMAC: Enabling Multi-Robot Collaboration from Single-Human Guidance
Zhengran Ji,
Lingyu Zhang,
Paul Sajda, Boyuan Chen
Preprint 2024 arxiv /
project page /
video
Enables multi-robot collaboration from single-human guidance. Inspired by the human theory-of-mind,
HUMAC leverages human-robot interface
that allows a single human to guide multiple robots simultaneously, through which collaborative
behavior can be learned.
Robust Perception through Equivariance
Chengzhi Mao,
Lingyu Zhang,
Abhishek Vaibhav Joshi,
Junfeng Yang,
Hao Wang,
Carl Vondrick
ICML 2023 paper /
project page
We introduce a framework that uses the dense intrinsic constraints in natural images to robustify
inference, allowing the model to adjust dynamically to each
individual image's unique and potentially novel characteristics at inference time.
Adversarially Robust Video Perception by Seeing Motion Lingyu Zhang*,
Chengzhi Mao*,
Junfeng Yang,
Carl Vondrick
Preprint 2023 arxiv /
project page
We find that adversarial attacks generated for fooling video classifiers also collaterally corrupt
motion. We propose to defend against attacks at test time by restoring disrupted motion.
A Stereo Matching Method for Three-Dimensional Eye Localization of Autostereoscopic
Display
Bangpeng Xiao,
Shenyuan Ye,
Xicai Li,
Min Li,
Lingyu Zhang,
Yuanqing Wang
International Conference on Image and Graphics, 2021
paper
We improve and optimize the ZNCC stereo matching algorithm for three-dimensional eye localization.
We improve operation logic of the matching and optimize the scanning strategy based on the
application scenarios.
algorithm
Selected Projects
Noise as Masks
Lingyu Zhang Representation Learning Final Project, 2022
paper
We propose to use noise as masks for masked image modeling. While randomized patch masking has
yielded decent results in self-supervised learning, it is not at all obvious that it is the optimal
design. We show that
theoretically inspired semantically-guided noise masks can be a potentially well-performing
alternative.
Entropy Constrained Information Bottleneck
Lingyu Zhang Sparse and Low-Dimensional Models for High-Dimensional Data Final Project, 2022
paper
We propose to use deterministic encoding along with actual quantization
on latents, rendering the IB problem a source compression. By doing so, finite non-trivial
mutual
information can be estimated.
Unsupervised Harmonic Sound Source Separation with Spectral Clustering
Lingyu Zhang,
Yiming Lin,
Lucy Wang,
Zhaoyuan Deng
Unsupervised Machine Learning Final Project, 2021
paper
We modeled mixed sources of audio signals by sinusoidal modeling with Short-Time Fourier
Transforms. Based on selected spectral peaks of sinusoidal parameters, we constructed a
similarity function between time and frequency components, and applied spectral clustering
to
globally partition the data.
Design and Optimization of a Multi-scale Representation based Image Compression
Network
Lingyu Zhang Undergraduate Thesis, 2021
thesis (Chinese)
Learned image compression has surpassed the rate-distortion performance of hand-crafted
traditional image codecs in recent years. However, they are not yet practical because of
their
significantly slower decoding speed than classical algorithms. We investigated the
possibility
of directly performing vision tasks in the latent space and found that using a
multi-scale encoder helped preserve semantic meaning in latent codes while maintaining
state-of-the-art compression rates
Dynamic Disparity Range Semi-Global Matching for Video Stereo Matching
Lingyu Zhang, Computer Vision Final Project, 2020
slides /
report (Chinese)
Implemented an accelerated stereo matching algorithm for video sequences, utilizing a
dynamic
disparity range search based on temporal correlation between frames, saving 21% of
computational
time with minimal accuracy loss. Designed a Divided Section cost function, preserving more
information than Census cost, achieving 18% better matching accuracy while trading off
computational complexity.