About me
Hi! My name is Carl Edwards, and I’m from Knoxville, Tennessee. I’m a PhD candidate in computer science at the University of Illinois at Urbana-Champaign where I’m advised by Professor Heng Ji. My research focuses on, with increasing specificity, AI4Science → NLP4Science → NLP4Chemistry → NLP for Controlling Molecular Design in cancer drugs, organic photovoltaics, etc.
The world faces an enormous number of problems in the coming decades on scales of complexity never-before-seen, in areas such as climate change, healthcare, and food security, each requiring innovative scientific solutions that are scalable, adaptable, and cost-effective. Further, we need to develop these solutions quickly. Under these conditions, vast quantities of information are being created at ever increasing rates. This flood of information leads to inefficiencies via duplication of effort while critical information is lost in the deluge of papers. To solve these problems, we need to be able to synthesize and understand massive scales of information. This motivates my high-level research direction: applying natural language processing (NLP) for scientific discovery. Specifically, I am currently interested in NLP for molecular and drug discovery, particularly by integrating natural language with molecules.
In terms of research interests, I am generally interested in models which leverage multiple modalities and domains of data centered around language. My research interests cover information extraction, information retrieval, natural language processing, representation learning, text mining, and transfer learning. In particular, my work seeks to apply these tools (and develop new ones!) to scientific texts to accelerate scientific discovery. I’m currently working on projects related to chemistry literature in association with the Molecule Maker Lab NSF AI Institute. In general, I’m passionate about language+multimodal for compositional function-level control of drugs, proteins, and material design.
News
Excited to have Chemreasoner accepted at ICML24! Really excited about this direction of integrating simulations with LLMs for molecular discovery!
I’ll be organizing the first “Language + Molecules” workshop at ACL24 in Bangkok and presenting an introductory tutorial earlier at EACL24 in Malta! I’m very excited to contribute to building a community in this impactful new research area.
- I’ll be interning at Genentech this summer on the AI research team. Excited to learn more about drug discovery in industry and hopefully do something impactful!
- I gave a talk on “Language-Guided Scientific Discovery for Chemistry” at the exciting new Center for the Transformation of Chemistry! Here are the slides from my talk! In particular, I’m partial to this key overview slide.
- I’m excited to have contributed to the exciting “Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems ” survey! Particularly, my contribution focused on the “ Natural Language-Guided Scientific Discovery” section.
Resume
See my (possibly outdated) CV attached here.
Research
In the past, I’ve had the opportunity to work on a variety of exciting research projects!
University of Illinois
I’ve been working with the Molecule Maker Lab Institute on integrating molecule and text information. We’re working on integrating new methodologies with real-world laboratory experimentation for drug and material discovery (stay tuned)! Past work includes proposing multiple novel tasks such as cross-modal retrieval: Text2Mol (EMNLP2021)
and cross-modal generation: Translation between Molecules and Natural Language (EMNLP 2022).
Here is the poster for Text2Mol.
Allen Institute for Artificial Intelligence (AI2) - Semantic Scholar Team
I had a great time working on drug synergy with LLMs during summer 2022 at the Semantic Scholar team. The project evolved from semi-parameteric language models using the literature to in-context learning and drug design. See our preprint here.
Carnegie Mellon University
During summer 2019, I worked at CMU as a Robotics Institute Summer Scholar. I conducted research in the Auton Lab with Prof. Artur Dubrawski working on detecting organizations in online escort advertisements. I used several similarity measures using tools like fastText word embeddings, face recognition, and others.
University of Zurich
During Fall 2018, I studied abroad at the University of Zurich in Zurich, Switzerland. While I was there, I worked on creating joint embeddings between images and knowledge graphs. My project report is here.
Oak Ridge National Lab
I interned at ORNL during the summers of 2017 and 2018 through the HERE and DOE SULI programs respectively. I worked on optimizing phase values in continuous subarrayed radar arrays using metaheuristic techniques like genetic algorithms and simulated annealing.
University of Tennessee
From fall 2016 to spring 2018, I worked as an undergraduate research assistant. I worked on molecular dynamics simulations and data analysis. I also performed Brownian dynamics simulations of flowing polymer solutions, and I extracted solution properties from the resulting data.
Publications, Posters, etc.
Preprint
MolCap-Arena: A Comprehensive Captioning Benchmark on Language-Enhanced Molecular Property Prediction
Carl Edwards, Ziqing Lu, Ehsan Hajiramezanali, Tommaso Biancalani, Heng Ji, Gabriele Scalia
arXiv preprint arXiv:2411.00737. 2024.
[pdf]
Geometry Informed Tokenization of Molecules for Language Model Generation
Xiner Li, Limei Wang, Youzhi Luo, Carl Edwards, Shurui Gui, Yuchao Lin, Heng Ji, Shuiwang Ji
arXiv preprint arXiv:2408.10120. 2024.
[pdf]
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, […], Carl Edwards, […], Alán Aspuru-Guzik, Erik Bekkers, Michael Bronstein, Marinka Zitnik, Anima Anandkumar, Stefano Ermon, Pietro Liò, Rose Yu, Stephan Günnemann, Jure Leskovec, Heng Ji, Jimeng Sun, Regina Barzilay, Tommi Jaakkola, Connor W. Coley, Xiaoning Qian, Xiaofeng Qian, Tess Smidt, and Shuiwang Ji
arXiv preprint arXiv:2307.08423. 2023.
[pdf]
Conference
Translation between Molecules and Natural Language
Carl Edwards*, Tuan Lai*, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2022.
[pdf] [code]
Text2Mol: Cross-Modal Molecular Retrieval with Natural Language Queries
Carl Edwards, ChengXiang Zhai, and Heng Ji
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021.
[pdf] [video - on underline] [code]
ChemReasoner: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback
Henry W. Sprueill, Carl Edwards, Khushbu Agarwal, Mariefel V. Olarte, Udishnu Sanyal, Conrad Johnston, Hongbin Liu, Heng Ji, and Sutanay Choudhury Proceedings of the 2024 Internation Conference on Machine Learning (ICML). 2024.
[pdf]
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design
Carl Edwards, Aakanksha Naik, Tushar Khot, Martin D Burke, Heng Ji, and Tom Hope
Proceedings of the First Conference on Language Modeling (COLM). 2024.
[pdf]
L+M-24: Building a Dataset for Language+Molecules @ ACL 2024
Carl Edwards, Qingyun Wang, Lawrence Zhao, Heng Ji Proceedings of the 1st Workshop on Language + Molecules (L+M 2024) at ACL 2024. 2024.
[pdf] [code]
GLaD: Synergizing Molecular Graphs and Language Descriptors for Enhanced Power Conversion Efficiency Prediction in Organic Photovoltaic Devices
Thao Nguyen, Tiara Torres-Flores, Changhyun Hwang, Carl Edwards, Ying Diao, Heng Ji
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM). 2024.
[pdf]
Invariant Tokenization of Crystalline Materials for Language Model Enabled Generation
Keqiang Yan, Xiner Li, Hongyi Ling, Kenna Ashen, Carl Edwards, Raymundo Arroyave, Marinka Zitnik, Heng Ji, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji
Proceedings of The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS). 2024.
[pdf]
Defining a New NLP Playground
Sha Li, Chi Han, Pengfei Yu, Carl Edwards, Manling Li, Xingyao Wang, Yi Fung, Charles Yu, Joel R. Tetreault, Eduard H Hovy, and Heng Ji Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings. 2023.
[pdf]
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design
Henry William Sprueill, Carl Edwards, Mariefel V Olarte, Udishnu Sanyal, Heng Ji, and Sutanay Choudhury
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) Findings. 2023.
[pdf] [code]
Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention
Carl Edwards and Heng Ji
Proc. The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2023.
[pdf]
Team Skeletor at Touché 2021: Argument Retrieval and Visualization for Controversial Questions
Kevin Ros*, Carl Edwards*, Heng Ji, and ChengXiang Zhai
CLEF (Working Notes) 2441-2454. 2021.
[pdf] [video] [code]
RESIN-11: Schema-guided Event Prediction for 11 Newsworthy Scenarios
Xinya Du, Zixuan Zhang, Sha Li, Pengfei Yu, Hongwei Wang, Tuan Manh Lai, Xudong Lin, Ziqi Wang, Iris Liu, Ben Zhou, Haoyang Wen, Manling Li, Darryl Hannan, Qi Zeng, Qing Lyu, Charles Yu, Carl Edwards, Xiaomeng Jin, Yizhu Jiao, Ghazaleh Kazeminejad, Rotem Dror, Zhenhailong Wang, Chris Callison-Burch, Mohit Bansal, Carl Vondrick, Jiawei Han, Dan Roth, Shih-Fu Chang, Martha Palmer and Heng Ji
Proc. 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL2022) System Demonstration Track
* - equal contribution
Journal Article
In-plane and out-of-plane rotational motion of individual chain molecules in steady shear flow of polymer melts and solutions C.N. Edwards, M.H. Nafar Sefiddashti, B.J. Edwards, and B. Khomami, J. Mol. Graph. Model., 81, 184-196 (2018).
[link]
Using Similarity Measures to Detect Organizations in Online Escort Advertisements C. Edwards, A. Wertz, and A. Dubrawski, Robotics Institute Summer Scholar’ Working Papers Journal, 7, 43-49 (2019).
[pdf]
Presentations
Integrating generative AI with computational chemistry for catalyst design in biofuel/bioproduct applications. Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, H. Ji, and S. Choudhury. American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana. March 18, 2024.
Out-of-plane rotational motion in shear flow of polymer melts and solutions. M.H. Nafar Sefiddashti, C.N. Edwards, B.J. Edwards, and B. Khomami, The Society of Rheology 89th Annual Meeting, Denver, CO, October 8-12, 2017.
Posters
Semi-supervised New Event Type Induction
Using Similarity Measures to Detect Organizations in Online Escort Advertisements
Subarrayed Radar Arrays Beam Optimization
Extreme-Scale Heterogeneous Inference with Large Language Models and Atomistic Graph Neural Networks for Catalyst Discovery. Sprueill H.W., C. Edwards, M.V. Olarte, U. Sanyal, K. Agarwal, H. Ji, and S. Choudhury. 03/18/2024. American Chemical Society Spring 2024 National Meeting, New Orleans, Louisiana.
Other
Daniel, Barry, Carl Edwards, and Adam Anderson. “Phase-Only Beam Broadening of Contiguous Uniform Subarrayed Arrays Utilizing Three Metaheuristic Global Optimization Techniques.” arXiv preprint arXiv:2009.06123 (2020).
[pdf]
Selected Awards and Honors
Saburo Muroga Endowed Fellowship
Min Kao Scholar (2018, 2019)
Outstanding Computer Science Junior 2018 (Sole recipient)
Recognition of Pi Mu Epsilon Tennessee Delta Chapter
See other awards on my resume above.
Service
NSF Molecule Maker Laboratory Institute:
o Student and Postdoc Council Educational & Outreach Activities Chair 2021-2023, Social Team Member 2023-present
o Certificate of Public Engagement
Organizing Committee: Language + Molecules Workshop at ACL 2024
Program Committee: ACL-IJCNLP 2021, ACL 2022-2024, NAACL 2022, 2024, EACL 2023-2024, NeurIPS 2023 AI4Science, GenBio, AAAI 2024-25, ACL Rolling Review, ICML 2024 AI4Science
Undergraduate Mentorship:
o Summer 2023: Mentored three undergraduates on projects related to language-enabled protein design, language-molecule association rule mining, and scientific language model factuality evaluation and updating.
Presentations to High School Students:
o Summer 2019: Presentation on “Detecting Human Trafficking Organizations” with AI4All@CMU
o April 2021, 2022: Illinois CS Sail course on “Learning Word Representations”
Presentations to Middle School Students:
o Spring 2023: Two presentations on “Intro to AI for Chemistry” for underrepresented middle schoolers with The Well Experience and DREAAM.
Outreach:
o November 2023: Cena y Ciencias – Dual language outreach program presenting DIY Solar Cell activity
Some Random Projects
Multimodal Molecule Reaction Prediction from Knowledge Graph of Reactions [Proposal] [Report]
Avoiding Catastrophic Forgetting in AI Safety Gridworld
High School Projects
Neural networks looking for food
old Minecraft Bukkit mod (pre 1.7)
Some fun activities I’ve done
UTK Machine Learning Club
Art competition entry using neural style transfer.
HackUTK
Martial Arts
Classical Singing
Governor’s School for Computational Physics
FIRST Robotics
Birdwatching