{"id":6791,"date":"2026-07-01T23:47:50","date_gmt":"2026-07-01T18:47:50","guid":{"rendered":"https:\/\/cifrum.kz\/google-tabfm-zero-shot-tabular-data-predictions\/"},"modified":"2026-07-01T23:47:50","modified_gmt":"2026-07-01T18:47:50","slug":"google-tabfm-zero-shot-tabular-data-predictions","status":"publish","type":"post","link":"https:\/\/cifrum.kz\/en\/google-tabfm-zero-shot-tabular-data-predictions\/","title":{"rendered":"Google unveils TabFM for zero-shot predictions on tabular data"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><strong>Mountain View, United States.<\/strong> Google Research unveiled TabFM on 30 June, a foundation model for classification and regression on tabular data. It can work with a new table without separately training its weights, manually engineering features or running a hyperparameter search, according to the <a href=\"https:\/\/research.google\/blog\/introducing-tabfm-a-zero-shot-foundation-model-for-tabular-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">Google Research announcement<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">That does not mean TabFM was never trained. Google pretrained the model on hundreds of millions of synthetic datasets. Here, <em>zero-shot<\/em> means that its parameters are not updated for a particular table: labelled examples are supplied at inference time and become the context for the prediction.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why tables remain a difficult AI problem<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Sales records, customer applications, transactions, inventories and medical results are usually stored as rows and columns rather than text or images. Gradient boosting, random forests and other specialised algorithms have dominated this kind of data for years.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A conventional project involves selecting features, handling missing values and categories, training several models, tuning them and validating the result on held-out data. TabFM is designed to shorten that cycle: one pretrained model infers the pattern of a new task from examples shown alongside the rows that need answers.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Stage<\/th><th>Conventional tabular ML<\/th><th>TabFM<\/th><\/tr><\/thead><tbody><tr><td>Task setup<\/td><td>A separate pipeline for each dataset<\/td><td>Training rows are passed as context<\/td><\/tr><tr><td>Model parameters<\/td><td>Updated during training<\/td><td>Unchanged on the new table<\/td><\/tr><tr><td>Tuning<\/td><td>Often requires a hyperparameter search<\/td><td>Base mode uses one forward pass<\/td><\/tr><tr><td>Quality checks<\/td><td>Required<\/td><td>Still required<\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">The workflow difference. TabFM reduces dataset-specific training but does not remove the need to validate data and predictions.<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-1024x576.png\" class=\"wp-image-6787\" alt=\"Four-step diagram of TabFM: context, row and column attention, row compression and prediction\" loading=\"lazy\" decoding=\"async\" srcset=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-1024x576.png 1024w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-300x169.png 300w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-768x432.png 768w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-1536x864.png 1536w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en-1280x720.png 1280w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/how-tabfm-works-en.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">A simplified view of TabFM. Cifrum.kz visualisation based on the architecture description from Google Research.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">How the model reads rows and columns<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">TabFM receives the labelled portion of a table and the rows requiring answers as one input. Its first stage alternates attention across columns and rows, looking for relationships among features and patterns across examples.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It then compresses the information in each row into a dense vector. A 24-block causal ICL transformer operates on the resulting sequence and returns a class or numeric value. The <a href=\"https:\/\/huggingface.co\/google\/tabfm-1.0.0-pytorch\" target=\"_blank\" rel=\"noopener noreferrer\">TabFM 1.0.0 model card<\/a> says the architecture supports both numerical and categorical columns.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The scikit-learn-compatible interface still uses a <code>fit()<\/code> method, which may look contradictory. The <a href=\"https:\/\/github.com\/google-research\/tabfm\" target=\"_blank\" rel=\"noopener noreferrer\">official TabFM repository<\/a> shows that this call prepares category encoders and numerical scaling; it does not retrain the foundation model\u2019s parameters.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why Google trained on synthetic tables<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Large open collections exist for text and vision models, while industrial tables frequently contain proprietary schemas and personal information. Google addressed that shortage by dynamically generating hundreds of millions of synthetic datasets using structural causal models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The approach exposes TabFM to many types of feature relationships without using real customer databases. It also creates uncertainty: a synthetic world cannot guarantee complete coverage of rare events, behaviour shifts or domain-specific biases.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What the TabArena ranking showed<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Google evaluated TabFM on the <a href=\"https:\/\/tabarena.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">open TabArena benchmark<\/a>, covering 38 classification and 13 regression datasets with between 700 and 150,000 rows. TabArena calculates Elo ratings from head-to-head comparisons among methods.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-1024x576.png\" class=\"wp-image-6786\" alt=\"Chart showing the Elo ratings of six leading TabArena models for classification and regression\" loading=\"lazy\" decoding=\"async\" srcset=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-1024x576.png 1024w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-300x169.png 300w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-768x432.png 768w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-1536x864.png 1536w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en-1280x720.png 1280w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-tabarena-ranking-en.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">TabArena Elo ratings for classification and regression. Cifrum.kz visualisation based on the Google Research chart; a higher score indicates stronger performance within the respective task.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In the chart published by Google, the base TabFM scored 1,727 Elo for classification and 1,940 for regression, placing second in both groups. TabFM-Ensemble ranked first with 1,815 and 2,125 points respectively.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The ensemble is not equivalent to the simplest run. It combines 32 configurations with a non-negative least squares solver, adds cross and SVD features and uses Platt scaling for classification. The base TabFM makes its prediction in a single forward pass without tuning or cross-validation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Elo is a relative measure, and TabArena is a living benchmark. Leadership in one snapshot does not prove that a model will be best for every business dataset. As seen in <a href=\"https:\/\/cifrum.kz\/en\/glm-5-2-claude-cybersecurity-tests\/\">other specialised evaluations of AI systems<\/a>, results depend on the data, metric and testing conditions.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" width=\"1024\" height=\"576\" src=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-1024x576.png\" class=\"wp-image-6788\" alt=\"Infographic showing 51 datasets, 700 to 150,000 rows, up to 10 classes and optimisation for up to 500 features\" loading=\"lazy\" decoding=\"async\" srcset=\"https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-1024x576.png 1024w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-300x169.png 300w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-768x432.png 768w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-1536x864.png 1536w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en-1280x720.png 1280w, https:\/\/cifrum.kz\/wp-content\/uploads\/2026\/07\/tabfm-benchmark-limits-en.png 1600w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">TabArena evaluation scope and stated TabFM 1.0.0 boundaries. Sources: Google Research and the Hugging Face model card.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Where the \u201cno training\u201d promise ends<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Labelled examples are still needed.<\/strong> TabFM does not infer a task from nothing; historical rows with known answers form its context.<\/li><li><strong>Memory use grows with context.<\/strong> All training rows are supplied during inference.<\/li><li><strong>Classification has a hard limit.<\/strong> The current version supports no more than 10 classes.<\/li><li><strong>Very wide tables are a risk area.<\/strong> TabFM is optimised for up to 500 features, and behaviour may degrade beyond that range.<\/li><li><strong>High-stakes decisions require separate validation.<\/strong> Google advises testing on representative held-out data before deployment.<\/li><\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">A prediction should be treated as a probability estimate, not a guarantee. The recent case in which <a href=\"https:\/\/cifrum.kz\/en\/12-chinese-ai-models-germany-paraguay-prediction\/\">12 AI models unanimously missed a football result<\/a> illustrates the gap between a plausible calculation and what happens in the real world.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Open code, but not entirely open terms<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Google released the TabFM code on GitHub under Apache 2.0 and published JAX and PyTorch weights. The weights themselves carry a separate non-commercial licence. The model card also states that TabFM is not an officially supported Google product.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The company plans to integrate the technology into BigQuery. According to the announcement, users should be able to run classification and regression with an <code>AI.PREDICT<\/code> SQL command in the coming weeks. Until the function becomes available, this remains an announced plan rather than a current capability.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What TabFM could change in practice<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The biggest potential gain is speed to a first prototype. An analyst could quickly test whether a table contains enough signal to predict churn, fraud risk, prices or demand before building a full ML pipeline. That could lower the entry barrier to predictive analytics for smaller teams.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A final decision must still account for source-data quality, target leakage, sampling bias, the cost of errors and changes over time. TabFM removes part of the engineering routine; it does not remove responsibility for formulating the problem correctly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Sources:<\/strong> <a href=\"https:\/\/research.google\/blog\/introducing-tabfm-a-zero-shot-foundation-model-for-tabular-data\/\" target=\"_blank\" rel=\"noopener noreferrer\">Google Research announcement<\/a>, <a href=\"https:\/\/github.com\/google-research\/tabfm\" target=\"_blank\" rel=\"noopener noreferrer\">TabFM repository<\/a>, <a href=\"https:\/\/huggingface.co\/google\/tabfm-1.0.0-pytorch\" target=\"_blank\" rel=\"noopener noreferrer\">TabFM 1.0.0 model card<\/a> and the <a href=\"https:\/\/tabarena.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\">TabArena benchmark<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>The lead image was created with artificial intelligence for Cifrum.kz as a conceptual editorial illustration. The charts and diagrams were prepared by Cifrum.kz from the cited source data.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>TabFM treats labelled rows as context and predicts without changing model weights. We examine its architecture, TabArena results, licence and limits.<\/p>\n","protected":false},"author":1,"featured_media":6785,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"rank_math_focus_keyword":"Google TabFM,TabFM,tabular data,zero-shot,TabArena,BigQuery,artificial intelligence","rank_math_title":"Google TabFM predicts from tables without task-specific training","rank_math_description":"Google has unveiled TabFM for zero-shot classification and regression. How it works, what TabArena showed and where its limits lie.","rank_math_canonical_url":"","rank_math_seo_score":"","rank_math_pillar_content":"","rank_math_facebook_title":"","rank_math_facebook_description":"","rank_math_facebook_image":"","rank_math_facebook_image_id":"","rank_math_twitter_title":"","rank_math_twitter_description":"","rank_math_twitter_image":"","rank_math_twitter_image_id":"","rank_math_news_sitemap_genre":"","rank_math_news_sitemap_keywords":"","rank_math_news_sitemap_stock_tickers":"","rank_math_robots":null,"rank_math_advanced_robots":"","rank_math_schema_News":"","footnotes":""},"categories":[2104,11],"tags":[],"cifrum_os_content_type":[],"class_list":["post-6791","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence-en","category-digitalization-news-on-digital-rum"],"acf":[],"_links":{"self":[{"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/posts\/6791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/comments?post=6791"}],"version-history":[{"count":0,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/posts\/6791\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/media\/6785"}],"wp:attachment":[{"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/media?parent=6791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/categories?post=6791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/tags?post=6791"},{"taxonomy":"cifrum_os_content_type","embeddable":true,"href":"https:\/\/cifrum.kz\/en\/wp-json\/wp\/v2\/cifrum_os_content_type?post=6791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}