All posts

Zochi Technical Report

The First Artificial Scientist

Published

Mar 17, 2025

Introduction

Zochi is the first AI system capable of autonomously completing the entire scientific research process—from hypothesis generation to peer-reviewed publication—producing state-of-the-art results.

Unlike prior systems limited to narrow, predefined tasks, Zochi excels in addressing open-ended research challenges at the forefront of artificial intelligence. Their effectiveness is validated by multiple peer-reviewed publications accepted at ICLR 2025 workshops, underscoring Zochi's ability to generate novel and academically rigorous contributions.

Full Report: PDF
Papers & Code: GitHub

Zochi’s Work

Efficient Model Adaptation Through Orthogonal Knowledge Spaces

Zochi identified a critical bottleneck in AI development: cross-skill interference in parameter-efficient fine-tuning. When adapting models to multiple tasks simultaneously, improvements in one skill often degrade others.

Zochi developed CS-ReFT (Compositional Subspace Representation Fine-tuning), which focuses on representation editing rather than weight modifications. CS-ReFT learns orthonormal subspace transformations in the hidden state, each specializing in a distinct skill, composed through a lightweight router.

When applied to Llama-2-7B on AlpacaEval, CS-ReFT achieved a 93.94% win rate, surpassing GPT-3.5-Turbo (86.30%) while requiring only 0.0098% of model parameters.

The paper received strong peer review scores (6,7,6), with reviewers commending its "clever idea" and effectiveness in addressing "a critical limitation of ReFT."

Discovering AI Vulnerabilities Through Autonomous Multi-turn Red Teaming

Zochi explored AI safety research and identified an emerging subarea: multi-turn attacks on LLMs. Starting with existing safety literature, Zochi proposed Siege, a novel framework that enhances multi-turn jailbreaking strategies with tree search algorithms.

Zochi discovered a previously underexploited pattern: models exhibit "partial compliance" behaviors where they reveal fragments of restricted information while appearing to maintain safety guardrails. The system developed mechanisms to systematically identify and exploit these patterns across conversation branches.

Siege achieves a 100% success rate against GPT-3.5-Turbo and 97% against GPT-4 while using fewer queries than previous methods.

Reviewers gave scores of (7,7), highlighting the paper's "effective, intuitive method" that is "significantly more effective than prior methods, necessitating a reassessment of existing AI defense strategies."

Computational Biology Advancements (EGNN-Fusion)

Demonstrating its versatility beyond core AI research, Zochi also made significant contributions to computational biology with EGNN-Fusion, an efficient architecture for protein-nucleic acid binding site prediction.

EGNN-Fusion achieves competitive performance against state-of-the-art methods while reducing parameter count by 95%. This work demonstrates Zochi's ability to transfer knowledge across domains and address complex scientific challenges beyond AI itself, showing its capacity to tackle interdisciplinary research with practical applications for human health.

As Zochi completed this project after the workshop deadlines, this particular paper has not been peer-reviewed, and is instead under review at a journal.

Evaluation Results

Zochi consistently produces higher-quality research papers compared to all baseline systems. When evaluated using an automated reviewer based on NeurIPS conference guidelines, Zochi's papers received scores of 8, 8, and 7, all well above the acceptance threshold of 6 that represents the average accepted paper at top machine learning conferences.

In contrast, papers from other AI systems receive significantly lower scores, averaging around 4. This evaluation gap is particularly notable given the substantial difference in problem complexity tackled by each system. While baseline systems focus on relatively constrained problems—such as 2D diffusion models, toy-scale language models, or specific cognitive biases—Zochi addresses open-ended challenges, proposing novel and verifiable state-of-the-art methods.

As an exploratory exercise, Zochi was evaluated on a subset of Kaggle-based challenges from MLE-Bench to assess its performance on conventional machine learning engineering tasks. Without any task-specific optimization, Zochi achieves the state-of-the-art, surpassing median human performance on 80% of tasks and securing medals on 50% of them. These outcomes exceed previous benchmarks such as Agent Laboratory, AIDE, and OpenHands, further highlighting the robustness and adaptability of Zochi’s core capabilities.

Ethical Considerations

At Intology, we recognize the significant ethical implications of developing Artificial Scientists like Zochi. Our approach to responsible development includes rigorous human verification of all research outputs, transparent attribution practices that acknowledge AI contributions without claiming authorship, and careful consideration of impacts on publication venues and the scientific community. We discourage the submission of AI-produced work that human experts have not fully verified. We do not believe AI systems should be authors on papers, as they cannot take responsibility for their work.

We direct Zochi's capabilities toward beneficial applications, exemplified by its work identifying AI safety vulnerabilities. While enabling accelerated scientific discovery, we remain committed to appropriate human oversight at each stage of development.

We are committed to maintaining the integrity of science and we are in discussion with the workshop organizers of Zochi's accepted papers. If they approve, we would be honored to present this work and ensure Zochi's valuable contributions reach the research community.

Read our more detailed thoughts on safety and ethical considerations in our report.