Papers

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

NeurIPS

2021

Yu, Tianhe, Kumar, Aviral, Chebotar, Yevgen, Hausman, Karol, Levine, Sergey, Finn, Chelsea

Offline reinforcement learning (RL) algorithms have shown promising results in domains where abundant pre-collected data is available. However, prior methods focus on solving individual problems from scratch with an offline dataset without considering how an offline RL agent can acquire multiple skills. We argue that a natural use case of offline RL is in settings where we can pool large amounts of data collected in various scenarios for solving different tasks, and utilize all of this data to learn behaviors for all the tasks more effectively rather than training each one in isolation. However, sharing data across all tasks in multi-task offline RL performs surprisingly poorly in practice. Thorough empirical analysis, we find that sharing data can actually exacerbate the distributional shift between the learned policy and the dataset, which in turn can lead to divergence of the learned policy and poor performance. To address this challenge, we develop a simple technique for data- sharing in multi-task offline RL that routes data based on the improvement over the task-specific data. We call this approach conservative data sharing (CDS), and it can be applied with multiple single-task offline RL methods. On a range of challenging multi-task locomotion, navigation, and vision-based robotic manipulation problems, CDS achieves the best or comparable performance compared to prior offline multi- task RL methods and previous data sharing approaches.

Conservative Data Sharing for Multi-Task Offline Reinforcement Learning

Knowledge-Empowered Dynamic Graph Network for Irregularly Sampled Medical Time Series

Adaptive Stabilization Based on Machine Learning for Column Generation

HHD-GP: Incorporating Helmholtz-Hodge Decomposition into Gaussian Processes for Learning Dynamical Systems

Sub-optimal Experts mitigate Ambiguity in Inverse Reinforcement Learning

PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices

RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance

Evidential Stochastic Differential Equations for Time-Aware Sequential Recommendation

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

NeuroPath: A Neural Pathway Transformer for Joining the Dots of Human Connectomes

SLIM: Style-Linguistics Mismatch Model for Generalized Audio Deepfake Detection

Optimization Algorithm Design via Electric Circuits

Abductive Reasoning in Logical Credal Networks

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Learning to Generate Visual Questions with Noisy Supervision

Instruction Tuning With Loss Over Instructions

Can Graph Neural Networks Expose Training Data Properties? An Efficient Risk Assessment Approach

Log-concave Sampling from a Convex Body with a Barrier: a Robust and Unified Dikin Walk

Integrating GNN and Neural ODEs for Estimating Non-Reciprocal Two-Body Interactions in Mixed-Species Collective Motion

Accelerating ERM for data-driven algorithm design using output-sensitive techniques

Hyperbolic Representation Learning: Revisiting and Advancing

On the Epistemic Limits of Personalized Prediction

Structural Inference of Dynamical Systems with Conjoined State Space Models

Aligning to Thousands of Preferences via System Message Generalization

Hybrid Mamba for Few-Shot Segmentation

On the Expressive Power of Tree-Structured Probabilistic Circuits

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Time-Varying LoRA: Towards Effective Cross-Domain Fine-Tuning of Diffusion Models

Fast Rates for Bandit PAC Multiclass Classification

On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems

BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO

DePLM: Denoising Protein Language Models for Property Optimization

MambaTree: Tree Topology is All You Need in State Space Model

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

On Divergence Measures for Training GFlowNets

What to Say and When to Say it: Live Fitness Coaching as a Testbed for Situated Interaction

Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

Conditional Synthesis of 3D Molecules with Time Correction Sampler

Amortizing intractable inference in diffusion models for vision, language, and control

Video Diffusion Models are Training-free Motion Interpreter and Controller

Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration

UniAudio 1.5: Large Language Model-Driven Audio Codec is A Few-Shot Audio Task Learner

On Batch Teaching with Sample Complexity Bounded by VCD

NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing

$NeuroPath$ : A Neural Pathway Transformer for Joining the Dots of Human Connectomes