
China’s latest AI breakthrough may be turning heads, but Taiwan is focused on a different challenge: ensuring the next generation of language models reflects local identities and democratic norms.
The launch of the Chinese chatbot DeepSeek-R1 in late January took global tech watchers by surprise. While earlier Chinese models – including Baidu’s Ernie and Doubao, developed by TikTok parent company ByteDance – had shown promise in areas such as Chinese-language tasks, mathematics, and coding, they were hampered by major limitations, including weaker English-language capabilities. Their limited accessibility also meant they received little testing outside of China.
However, the release of DeepSeek-R1 marked the first time a Chinese-developed large language model (LLM) made waves internationally. Among the impressive claims about R1 was its remarkably low development cost. While Open AI’s GPT-4o reportedly cost at least US$100 million to train, some estimates put the total R&D spending much higher, in part due to OpenAI’s funding structure that included a 2019 investment from Microsoft valued at US$1 billion. DeepSeek researchers suggested its chatbot had been developed for only US$5.6 million.
Perhaps most impressively, DeepSeek’s engineers had trained the R1 model using mid-range CPUs such as the Nvidia H800, and not Nvidia’s top-tier chips used in chatbots like GPT-4o or Anthropic’s Claude. Although the United States in 2022 prohibited the export of Nvidia’s highest-performing chips to China, DeepSeek-R1 has nonetheless managed to outpace other leading bots by using just 2,048 processors spread across 256 servers – a fraction of the tens of thousands of CPUs and more than 3,000 server nodes typically required for training.
The dramatic gains in efficiency and lower development costs of this disruptive LLM are largely attributed to low-level programming techniques, including PTX – an assembly-like language that allows developers to fine-tune performance and maximize hardware usage by squeezing more out of limited resources.
Within days of its release, the DeepSeek-R1 app shot to the top of the U.S. Apple App Store’s free download rankings, displacing ChatGPT, TikTok, and Meta’s own Instagram, Facebook, and WhatsApp. The Nasdaq fell more than 3% amid a broad sell-off of technology stocks, notably affecting chipmakers and data center operators worldwide. Nvidia’s shares plunged 17% following the debut of DeepSeek-R1. Although the company later recovered, new restrictions imposed in April on exports of its H20 AI chip triggered another downturn, with Nvidia’s Taiwanese American chief executive, Jensen Huang, projecting a US$5.5 billion loss from canceled orders.
Yet for all the fanfare, many observers have questioned the legitimacy of DeepSeek’s bolder claims about its LLM. Analysts have challenged the stated figures, concluding that they likely account only for computational costs, while infrastructure, hardware, and human resource expenses are either excluded or significantly understated.
“Our understanding is that the actual cost is way higher than what they actually claim,” says Wesley Kuo, founder and CEO of Ubitus, a Taipei-based generative AI and cloud gaming service provider.
Backed by substantial investment from Nvidia, Ubitus supported Project TAME – a localized LLM using traditional Chinese characters – by providing H100 CPUs and gaming data last year. Ubitus also partnered with cable and battery manufacturer Foxlink and its subsidiary, Shinfox Energy, to establish Ubilink.AI. In collaboration with Asus, Ubilink built Taiwan’s largest green energy-powered AI supercomputing service center in just three months.
“On top of that, we did a lot of LLM applications and model training for our government, and also for the Japan government across different industries, like gaming, tourism, and retail,” says Kuo. “They believe AI can help solve labor shortages and issues with aging populations.”

Integrity issues
Kuo agrees with OpenAI and Microsoft, which owns a 49% stake in OpenAI, that DeepSeek stole data through a method known as model distillation. In this process, smaller, less advanced language models are trained to imitate the outputs of larger, more sophisticated ones. OpenAI and Microsoft argue that DeepSeek used OpenAI’s application programming interface – the tool that allows different software systems to connect – to aid its own development.
“They definitely take data from OpenAI,” says Kuo. “So while they do have some good techniques for reducing costs, I think there are still misunderstandings from media and the public.”
These misunderstandings also extend to DeepSeek’s claims about efficiency. Kuo notes that DeepSeek-R1, with 670 billion parameters, is more than 50% larger than Meta AI’s Llama 3.1 405B, which has 405 billion parameters. Parameters – the internal numerical values a model learns during training to make predictions – function as tiny connections within a vast neural network. Kuo adds that DeepSeek’s models were undeniably distilled from Llama 3.1, though in this case the process appears to have involved distilling from DeepSeek-R1 into smaller models built on the Llama 3.1 framework, a less direct form of replication.
“The foundation models are still from Llama or Alibaba or [French AI startup] Mistral,” says Kuo. “They just use the full version of DeepSeek as a coach to distill or upgrade the foundation model.”
Beyond rebuttals of some of the more far-fetched claims, doubts have also emerged about the extent of DeepSeek-R1’s capabilities. Experts suggest that like its Chinese predecessors, R1 performs well on specialized, task-specific functions but lags behind versions of GPT-4o in general-purpose performance.
However, for many users, the most significant limitation — and the greatest concern — with DeepSeek’s models is the severe restriction on free access to information. Users quickly discovered that attempts to ask about sensitive political topics, such as the 1989 Tiananmen Square protests, were met with evasive responses, with the program stating the subject was “beyond my scope.”
On topics such as the status of Xinjiang’s Uyghur minority and Taiwan, DeepSeek’s responses closely align with official Chinese Communist Party positions on geopolitical issues. Research indicates that more than 85% of DeepSeek’s outputs are censored to suppress information related to democracy, human rights, and China’s contested sovereignty claims.
In contrast, TAME and other Taiwan-developed large language models have emerged as alternatives to DeepSeek and other China-based bots within the Sinosphere. Among the first was the Trustworthy AI Dialogue Engine, or TAIDE, officially launched in June 2023 by the National Institute of Applied Research under the guidance of Taiwan’s National Science and Technology Council. One of TAIDE’s core objectives is to develop a model aligned with Taiwan’s social, cultural, and linguistic norms.

Training Taiwan’s voice
Work on TAIDE appears to have stalled or been “suspended,” according to Ubitus’ Kuo and other sources. Because much of the data was drawn from government sources, the model faced obvious limitations. Inquiries submitted by Taiwan Business TOPICS to TAIDE’s developers went unanswered.
Despite the drawbacks, Ubitus’ Kuo praises the TAIDE model as an important benchmark for Project TAME. The project takes its name from “Taiwan” and “mixture of experts,” a type of machine learning that selects which “experts” (or sub-models) to use to process inputs before combining the results to make predictions. It’s a model that helps improve efficiency and scalability.
TAME was developed by the Machine Intelligence and Understanding Laboratory (MiuLab) at National Taiwan University’s Department of Computer Science and Information Engineering, with funding from Pegatron, an electronics manufacturer owned by Asus; the petrochemical conglomerate Chang Chun Group; and Unimicron, a printed circuit board subsidiary of chipmaker United Microelectronics Corp. (UMC). Chang Gung Memorial Hospital and the Taipei-based tech blog TechOrange also contributed to the project.
Trained on 500 billion tokens – basic units of text such as words, characters, or punctuation marks – TAME outperformed competitors, including GPT-4o, across 39 evaluations comprising 3,000 questions last year. It scored nearly 7% higher than second-place Claude and passed university entrance, bar, and traditional Chinese medicine examinations.
One of TAME’s stated goals is to reinforce and promote local culture “across diverse domains,” according to the developers. Unlocking local language capabilities marks a significant step in enhanced accessibility.
“We’ve trained a Taiwanese voice LLM based on Whisper (OpenAI’s voice recognition technology),” says Kuo. “This has achieved the best results in understanding oral Taiwanese, and we’re now working on Hakka.”
Such efforts have received positive feedback from institutions in parts of Taiwan where these two languages are more commonly spoken. “Hospitals and banks in central and southern Taiwan have so many people speaking these languages, so they’re eager to adopt these models,” Kuo says.
Efforts are also underway to train the model in indigenous-language recognition, though Kuo acknowledges that limited data remains a major obstacle. He notes that the collaboration with National Taiwan University showed that training AI to learn a new language requires at least 10,000 to 20,000 hours of voice recordings paired with corresponding text. While such figures may be achievable for active languages like Amis, Atayal, and Paiwan, he says, many of Taiwan’s other indigenous tongues may struggle to provide the necessary data.
There is also the issue of accessing historical data in government archives, which has transformative potential. “We’ve tried to get the authority to license some of this from the government,” says Kuo. “But currently, some data is protected because of copyright.”
Still, the emergence of artificial general intelligence holds the tantalizing possibility of aiding in the revival of dying – and even extinct – languages.
The intersection of language and culture highlights a central principle behind Taiwan-based and Taiwan-focused LLMs: the pursuit of AI sovereignty as a means of reinforcing Taiwanese identity, communicating Taiwan’s story to the world, and protecting the nation’s information environment.
“If we deep dive into LLM models and training data, it’s a black box,” says Julian Chu, an industry consultant and director at the semi-governmental Market Intelligence & Consulting Institute (MIC). “We don’t know how much data is poisoned and how much is biased.”
Chu notes that concerns over how information is presented by Taiwan’s large language models go beyond the use of traditional characters. If you ask GPT questions in Mandarin, its output tends to reflect the style of the People’s Republic of China and fails to capture Taiwan’s culture, he says. “It’s about having Taiwanese companies using Taiwanese language or data to train LLMs to build AI sovereignty.”
He mentions the Formosa Foundation Model (FFM-Llama2) as another Taiwan LLM that showed promise in this regard. Released in September 2023 by Asus subsidiary Taiwan Web Service, FFM-Llama2 was aimed at “democratizing AI to empower businesses.” In a similar vein, the research institute at Hon Hai Precision Industry Co., better known as Foxconn, launched its own LLM, FoxBrain, in March.
Yet privately, some commentators are skeptical of big corporations’ ventures into LLMs, viewing them as based on limited training and outdated architecture. Acknowledging this criticism, a Foxconn spokesperson emphasized that the company’s LLM is currently just for its own use.
For Lin Yen-ting, a key member of the MiuLab team that trained and developed TAME, the motivation to create a Taiwan-centric LLM was clear-cut: to plug a Taiwan-shaped gap in the information environment. Noting that DeepSeek-R1 and other Chinese LLMs present a warped picture of Taiwan, Lin stresses that U.S.-developed models can also sometimes present Taiwan in a manner that its citizens would not recognize.
“I feel that open-source models don’t really care about this,” says Lin, who also interned at Meta, where he worked on Llama models. “They’re training on a snapshot of the internet that is dominated [in Chinese-language data] by China, so they learn that culture and ideology.”
For this reason, Lin says a lot of information is missed or “buried down in the data pool.” Inevitably, a lot of this information relates to Taiwan, he says. “So we want to kind of cherry-pick that part – the Taiwanese stuff – and retrain it into the model.”