About me

Hi! My name is Carl Edwards, and I’m from Knoxville, Tennessee. I’m a PhD candidate in computer science at the University of Illinois at Urbana-Champaign where I’m advised by Professor Heng Ji. My research focuses on, with increasing specificity, AI4Science → NLP4Science → NLP4Chemistry → NLP for Controlling Molecular Design in cancer drugs, organic photovoltaics, etc.

The world faces an enormous number of problems in the coming decades on scales of complexity never-before-seen, in areas such as climate change, healthcare, and food security, each requiring innovative scientific solutions that are scalable, adaptable, and cost-effective. Further, we need to develop these solutions quickly. Under these conditions, vast quantities of information are being created at ever increasing rates. This flood of information leads to inefficiencies via duplication of effort while critical information is lost in the deluge of papers. To solve these problems, we need to be able to synthesize and understand massive scales of information. This motivates my high-level research direction: applying natural language processing (NLP) for scientific discovery. Specifically, I am currently interested in NLP for molecular and drug discovery, particularly by integrating natural language with molecules.

In terms of research interests, I am generally interested in models which leverage multiple modalities and domains of data centered around language. My research interests cover information extraction, information retrieval, natural language processing, representation learning, text mining, and transfer learning. In particular, my work seeks to apply these tools (and develop new ones!) to scientific texts to accelerate scientific discovery. I’m currently working on projects related to chemistry literature in association with the Molecule Maker Lab NSF AI Institute. In general, I’m passionate about language+multimodal for compositional function-level control of drugs, proteins, and material design.

News

  • Excited to have Chemreasoner accepted at ICML24! Really excited about this direction of integrating simulations with LLMs for molecular discovery!

  • I’ll be organizing the first “Language + Molecules” workshop at ACL24 in Bangkok and presenting an introductory tutorial earlier at EACL24 in Malta! I’m very excited to contribute to building a community in this impactful new research area.

L+M Logo

  • I’ll be interning at Genentech this summer on the AI research team. Excited to learn more about drug discovery in industry and hopefully do something impactful!

Genentech Logo

CTC Logo

Resume

See my (possibly outdated) CV attached here.

Research

In the past, I’ve had the opportunity to work on a variety of exciting research projects!

University of Illinois

I’ve been working with the Molecule Maker Lab Institute on integrating molecule and text information. We’re working on integrating new methodologies with real-world laboratory experimentation for drug and material discovery (stay tuned)! Past work includes proposing multiple novel tasks such as cross-modal retrieval: Text2Mol (EMNLP2021)

Text2Mol Task

and cross-modal generation: Translation between Molecules and Natural Language (EMNLP 2022).

MolT5 Training

Here is the poster for Text2Mol.

PWC

Allen Institute for Artificial Intelligence (AI2) - Semantic Scholar Team

I had a great time working on drug synergy with LLMs during summer 2022 at the Semantic Scholar team. The project evolved from semi-parameteric language models using the literature to in-context learning and drug design. See our preprint here.

SynerGPT Training

Carnegie Mellon University

During summer 2019, I worked at CMU as a Robotics Institute Summer Scholar. I conducted research in the Auton Lab with Prof. Artur Dubrawski working on detecting organizations in online escort advertisements. I used several similarity measures using tools like fastText word embeddings, face recognition, and others.

Photo

University of Zurich

During Fall 2018, I studied abroad at the University of Zurich in Zurich, Switzerland. While I was there, I worked on creating joint embeddings between images and knowledge graphs. My project report is here.

Oak Ridge National Lab

I interned at ORNL during the summers of 2017 and 2018 through the HERE and DOE SULI programs respectively. I worked on optimizing phase values in continuous subarrayed radar arrays using metaheuristic techniques like genetic algorithms and simulated annealing.

Optimized radar array. Note the cyclic behavior of the phase values.

University of Tennessee

From fall 2016 to spring 2018, I worked as an undergraduate research assistant. I worked on molecular dynamics simulations and data analysis. I also performed Brownian dynamics simulations of flowing polymer solutions, and I extracted solution properties from the resulting data.

Photo Photo

Video

Publications, Posters, etc.

Preprint

MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction

Carl Edwards, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Heng Ji, Gabriele Scalia
arXiv preprint arXiv:2411.00737. 2024.
[pdf]

Geometry Informed Tokenization of Molecules for Language Model Generation

Xiner Li, Limei Wang, Youzhi Luo, Carl Edwards, Shurui Gui, Yuchao Lin, Heng Ji, Shuiwang Ji
arXiv preprint arXiv:2408.10120. 2024.
[pdf]

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, […], Carl Edwards, […], Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, and Shuiwang Ji
arXiv preprint arXiv:2307.08423. 2023.
[pdf]

Conference

Translation between Molecules and Natural Language

Carl Edwards*, Tuan Lai*, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
[pdf] [code]

Text2Mol: Cross-Modal Molecular Retrieval with Natural Language Queries

Carl Edwards, ChengXiang Zhai, and Heng Ji
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021.
[pdf] [video - on underline] [code]

ChemReasoner: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback

Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, and Sutanay Choudhury Proceedings of the 2024 Internation Conference on Machine Learning (ICML). 2024.
[pdf]

SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design

Carl Edwards, Aakanksha Naik, Tushar Khot, Martin D Burke, Heng Ji, and Tom Hope
Proceedings of the First Conference on Language Modeling (COLM). 2024.
[pdf]

L+M-24: Building a Dataset for Language+Molecules @ ACL 2024

Carl Edwards, Qingyun Wang, Lawrence Zhao, Heng Ji Proceedings of the 1st Workshop on Language + Molecules (L+M 2024) at ACL 2024. 2024.
[pdf] [code]

GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices

Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, Heng Ji
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM). 2024.
[pdf]

Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation

Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arroyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji
Proceedings of The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS). 2024.
[pdf]

Defining a New NLP Playground

Sha Li, Chi Han, Pengfei Yu, Carl Edwards, Manling Li, Xingyao Wang, Yi Fung, Charles Yu, Joel R. Tetreault, Eduard H Hovy, and Heng Ji Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings. 2023.
[pdf]

Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design

Henry William Sprueill, Carl Edwards, Mariefel V Olarte, Udishnu Sanyal, Heng Ji, and Sutanay Choudhury
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings. 2023.
[pdf] [code]

Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

Carl Edwards and Heng Ji
Proc. The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2023.
[pdf]

Team Skeletor at Touché 2021: Argument Retrieval and Visualization for Controversial Questions

Kevin Ros*, Carl Edwards*, Heng Ji, and ChengXiang Zhai
CLEF (Working Notes) 2441-2454. 2021.
[pdf] [video] [code]

RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios

Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Manh Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Rotem Dror, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer and Heng Ji
Proc. 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2022) System Demonstration Track

* - equal contribution

Journal Article

In-plane and out-of-plane rotational motion of individual chain molecules in steady shear flow of polymer melts and solutions C.N. Edwards, M.H. Nafar Sefiddashti, B.J. Edwards, and B. Khomami, J. Mol. Graph. Model., 81, 184-196 (2018).
[link]

Using Similarity Measures to Detect Organizations in Online Escort Advertisements C. Edwards, A. Wertz, and A. Dubrawski, Robotics Institute Summer Scholar’ Working Papers Journal, 7, 43-49 (2019).
[pdf]

Presentations

Integrating generative AI with computational chemistry for catalyst design in biofuel/bioproduct applications. Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, H. Ji, and S. Choudhury. American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana. March 18, 2024.

Out-of-plane rotational motion in shear flow of polymer melts and solutions. M.H. Nafar Sefiddashti, C.N. Edwards, B.J. Edwards, and B. Khomami, The Society of Rheology 89th Annual Meeting, Denver, CO, October 8-12, 2017.

Posters

MolT5

Text2Mol

Semi-supervised New Event Type Induction

Using Similarity Measures to Detect Organizations in Online Escort Advertisements

Subarrayed Radar Arrays Beam Optimization

Extreme-Scale Heterogeneous Inference with Large Language Models and Atomistic Graph Neural Networks for Catalyst Discovery. Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, K. Agarwal, H. Ji, and S. Choudhury. 03/18/2024. American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana.

Other

Daniel, Barry, Carl Edwards, and Adam Anderson. “Phase-Only Beam Broadening of Contiguous Uniform Subarrayed Arrays Utilizing Three Metaheuristic Global Optimization Techniques.” arXiv preprint arXiv:2009.06123 (2020).
[pdf]

Selected Awards and Honors

Saburo Muroga Endowed Fellowship

Goldwater Scholar

Min Kao Scholar (2018, 2019)

Photo

Outstanding Computer Science Junior 2018 (Sole recipient)

Photo

Recognition of Pi Mu Epsilon Tennessee Delta Chapter

Photo

See other awards on my resume above.

Service

NSF Molecule Maker Laboratory Institute:

o Student and Postdoc Council Educational & Outreach Activities Chair 2021-2023, Social Team Member 2023-present

o Certificate of Public Engagement

Organizing Committee: Language + Molecules Workshop at ACL 2024

Program Committee: ACL-IJCNLP 2021, ACL 2022-2024, NAACL 2022, 2024, EACL 2023-2024, NeurIPS 2023 AI4Science, GenBio, AAAI 2024-25, ACL Rolling Review, ICML 2024 AI4Science

Undergraduate Mentorship:

o Summer 2023: Mentored three undergraduates on projects related to language-enabled protein design, language-molecule association rule mining, and scientific language model factuality evaluation and updating.

Presentations to High School Students:

o Summer 2019: Presentation on “Detecting Human Trafficking Organizations” with AI4All@CMU

o April 2021, 2022: Illinois CS Sail course on “Learning Word Representations

Presentations to Middle School Students:

o Spring 2023: Two presentations on “Intro to AI for Chemistry” for underrepresented middle schoolers with The Well Experience and DREAAM.

Outreach:

o November 2023: Cena y Ciencias – Dual language outreach program presenting DIY Solar Cell activity

Some Random Projects

Multimodal Molecule Reaction Prediction from Knowledge Graph of Reactions [Proposal] [Report]

Optimization of a hybrid electric vehicle energy management control parameters by metaheuristic methods

Avoiding Catastrophic Forgetting in AI Safety Gridworld

Fashion MNIST Investigation

High School Projects

Neural networks looking for food

Photo

old Minecraft Bukkit mod (pre 1.7)

Some fun activities I’ve done

UTK Machine Learning Club

Photo Photo

Art competition entry using neural style transfer.

Photo

HackUTK

Martial Arts

Classical Singing

Photo

Governor’s School for Computational Physics

Photo

FIRST Robotics

Photo

Birdwatching

Photo