Generative Pre-trained Transformer 3 or GPT-3 is one of the most powerful and enrich language models of OpenAI. This 45-terabyte dataset model was designed exclusively from English-language sources, which is the only limitation of it.
We may soon see the Chinese-language equivalent of GPT-3, as Huawei recently described a language script called PanGu-Alpha (often written as PanGu- α). It’s the 750-gigabyte model, which design on 1.1 terabytes of Chinses scripts including-
- Language ebooks
- Social Media
This model contains more than 200 billion parameters which are 25 million extra in numbers comparing to GPT 3. It completes the language tasks such as spanning text summarization, question answering, and dialogue generation in a more efficient manner.
The company uses the 910 AI chipset with 256 teraflops high-stream computing power generated by each module. In this regard, the PanGu-α team analyzes the data of about 80 terabytes which covers public datasets, the Common Crawl dataset, and the open web platforms.
Afterward, refined it by eliminating the documents containing fewer than 60% Chinese characters, less than 150 characters, or only titles, advertisements, or navigation bars.
This model is likely to be used in the discrimination against marginalized peoples including Uyghurs living in China.