Publications

You can also find my publications on my Google Scholar profile.

  • ProCyon: A multimodal foundation model for protein phenotypes , Preprint, 2024

    Owen Queen, Yepeng Huang, Robert Calef, Valentina Giunchiglia, Tianlong Chen, George Dasoulas, LeAnn Tai, Yasha Ektefaie, Ayush Noori, Joseph Brown, Tom Cobley, Karin Hrovatin, Tom Hartvigsen, Fabian J. Theis, Bradley Pentelute, Manolis Kellis, and Marinka Zitnik.

    ProCyon is an 11B parameter multimodal model designed to predict and generate protein phenotypes across multiple scales of biology, spanning molecular functions to disease and therapeutics. We demonstrate its applicability in a variety of tasks, including novel transfer to poorly-characterized proteins, synthetic peptide binding, and indiciation-specific drug target retrieval.

  • Graph AI in Medicine , Annual Review of Biomedical Data Science, 2024

    Ruth Johnson, Michelle M. Li*, Ayush Noori*, Owen Queen*, Marinka Zitnik

    We present a review of previous work in graph machine learning to learn relational structures, provide greater interpretability, and integrate multiple modalities for biomedical data. We additionally identify future areas of focus for the field, in particular how foundation models on graphs can lead to clinically meaningful predictions and facilitate feedback loops with practitioners.

  • Encoding Time-Series Explanations through Self-Supervised Model Behavior Consistency , NeurIPS (Spotlight), 2023

    Owen Queen, Thomas Hartvigsen, Teddy Koker, Huan He, Theodoros Tsilikaridis, Marinka Zitnik

    We present a new time-series explainability method, TimeX. TimeX learns an interpretable surrogate model for a given predictor that learns to match predictor behavior through a novel loss known as model behavior consistency. TimeX introduces straight-through estimators (STEs) to learn discrete, faithful masks that match model behavior, learning succinct, interpretable masks that explain predictions on time series datasets. We benchmark TimeX on a wide variety of synthetic and real-world time-series datasets and demonstrate that it learns explanations that are highly faithful to model predictions.

  • Protein Language Models for Explainable Fine-Grained Evolutionary Pattern Discovery , IEEE BIBM, DLBIBM Workshop, 2023

    Ashley Babjac*, Owen Queen*, Shawn-Patrick Barhorst, Kambiz Kalhor, Andrew Steen, Scott Emrich

    We fine-tune pretrained protein language models on oceanic microbial genomics data to disciminate organisms found at the surface and subsurface of the ocean. We then use post-hoc explainability techniques to identify important residues that discriminate homologous proteins found at the surface and subsurface. We visualize the importance scores on the Alphafold-generated structure, and we work with microbiologists to identify biologically-relevant differences in these proteins.

  • Polymer graph neural networks for multitask property learning , npj Computational Materials, 2023

    Owen Queen, Gavin A. McCarver, Sai Thatigotla, Brendan P. Abolins, Cameron L. Brown, Vasileios Maroulas, Konstatinos D. Vogiatzis

    We develop a novel GNN-based architecture for learning properties of polymers. We demonstrate our model on glass transition temperature and the more challenging intrinsic viscosity prediction task, demonstrating state-of-the-art performance on a novel dataset. We use our model to high-throughput search novel chemical compound space for materials that exhibit desirable properties for materials engineering.

  • Domain adaptation for time series under feature and label shifts , ICML, 2023

    Huan He, Owen Queen, Teddy Koker, Consuelo Cuevas, Theodoros Tsilikaridis, Marinka Zitnik

    We present RAINCOAT, a novel domain adaptation built specifically for time series data. RAINCOAT demonstrates state-of-the-art performance on a variety of time series datasets for the domain generalization task, including the challenging universal domain adaptation setup where shift might occur in either feature or label space.

  • Evaluating explainability for graph neural networks , Nature Scientific Data, 2023

    Chirag Agarwal*, Owen Queen*, Himabindu Lakkaraju, Marinka Zitnik

    We benchmark several post-hoc GNN explainers on established real-world and synthetic datasets. We introduce several new metrics and experimental setups to evaluate explainers in a diverse manner. In addition, we introduce a new synthetic graph dataset generator that supports robust evaluation of explainers for GNNs. Our metrics are featured in PyTorch Geometric.

  • Deep learning for reference-free geolocation for poplar trees , NeurIPS 2022, AI for Science Workshop, 2022

    Cai John*, Owen Queen*, Wellington Muchero, Scott Emrich

    Using deep learning approaches, we learn to geolocate poplar trees from pre-alignment read fragments. We use a popular bioinformatics tool for finding common read fragment motifs, which is much faster than aligning all fragments to a reference genome. Through our simple approach, we show competitive results to algorithms trained on WGS data.

  • LASSO-based feature selection for improved microbial and microbiome classification , IEEE BIBM, MABM Workshop, 2021

    Owen Queen, Scott Emrich

    We explore feature selection algorithms for high-dimensional genomic datasets with a particular focus on LASSO-based feature selection. On several tasks, we show that LASSO consistently selects high quality features for machine learning, robust to tasks and hyperparameter selection. We focus our analysis on sepsis prediction from microbial sequencing data and environmental conditions from bulk rhizosphere sequencing data (microbiome in the soil around tree roots).

  • Note: * Denotes equal contribution