On Building and Designing Foundation Models

From association to reasoning: foundation models must learn to ask “what if.” An essay on their design and building blocks.

Foundation models are not accidents of scale. They emerge from a chain of decisions: data shape objectives, objectives shape architectures, and architectures determine what can be scaled. To design them well is to recognize how these components interact and constrain one another. While transformers have proven remarkably general across domains, their success does not come from blindly repeating a formula. It comes from adapting learning objectives, architectures, and scaling strategies to the structure of the data at hand. At the same time, scale can sometimes substitute for carefully chosen inductive bias, raising a central question: when is tailoring essential, and when does brute force suffice?

A Design Flow

The process can be thought of as a cascade:

Data + Questions → Learning Framework → Architecture → Scale → Engineering

  1. Data and questions. The starting point is not a model but the patterns in the data and the questions to be answered. Sequential data may call for causal modeling; long-range dependencies may demand bidirectional context; spatial data has no inherent order but strong locality; sparse, high-dimensional measurements may resist sequence modeling altogether. Multimodal settings require learning alignments and shared structure across disparate inputs.

  2. Learning framework. The training objective should reflect the kind of knowledge sought. Autoregressive prediction captures causality and supports generation. Masking leverages redundancy to enable reconstruction. Contrastive learning aligns different views of the same signal. Denoising works when corruption mirrors real-world noise, while generative objectives are needed when modeling the full distribution is the goal.

  3. Architecture. Once objectives are clear, they guide architectural choices. Attention mechanisms capture long-range dependencies; convolutions exploit locality and invariance; graphs encode irregular connectivity; hierarchical structures reflect multi-scale organization; cross-attention allows modality fusion. Aligning inductive bias with data structure is key, though scale complicates the picture, since large models with generic architectures can sometimes succeed even without bespoke tailoring.

  4. Scale. In some domains, breakthroughs have come from simply enlarging data, compute, and parameters. Others quickly reach limits and require augmentation, synthetic data, or stringent filtering. In scientific settings, scale is equally important, but the real bottleneck is different: data are fragmented, noisy, and drawn from complementary yet poorly aligned measurements. Progress will come less from raw accumulation and more from models that can integrate heterogeneity and make sense of diverse inputs.

Beyond Scale

Design choices in data, objectives, architectures, and scale explain much of the progress we have seen. But there is a ceiling to what this design flow can deliver: pattern recognition, no matter how refined, is not the same as reasoning. Foundation models are often presented as general-purpose engines of intelligence, so expectations naturally extend beyond prediction to explanation: to uncovering mechanisms, tracing causal links, and generating hypotheses that can be tested. Especially in science, where discovery depends on this kind of reasoning, the gap between what foundation models promise and what they can actually do is stark.

No matter how large or well-engineered, today’s models are limited in their ability to reason. What they do is build statistical links, a form of thinking in its own right, and one that can spark creativity by connecting patterns in unexpected ways. This is true in language, where a <think> tag can encourage long chains of association, and also in vision, where foundation models learn powerful visual embeddings without ever using words. Both show that associative thinking is real and valuable. But science requires more. To move beyond association, scientific foundation models must not only capture correlations across heterogeneous measurements, but also be designed to ask “what if,” to explore counterfactuals, propose mechanisms, and test hypotheses in silico. Only then can they progress from associative thinking to the reasoning that drives discovery.

The trajectory of foundation models will be shaped along two paths. One is toward universality: models that integrate across domains, handle scale, and embrace multimodality. The other is toward reasoning: systems capable of genuine hypothesis-driven discovery. The real future lies not in choosing one over the other, but in combining them: building models that are both broad and mechanistic, both scalable and capable of thinking in ways that bring us closer to understanding the living world.

Share the Post:

Related Posts