The MediaTek Research Breeze-7B model(MR Breeze-7B)
The worldwide MediaTek group’s research division, MediaTek Research, has made the MediaTek Research Breeze-7B model available to the public. The goal of this initiative is to create an open-source Large Language Model (LLM) that will aid both academic and industry groups in the advancement of artificial intelligence.
Constructed based on Mistral-7B, MR Breeze-7B has been tailored for Traditional Chinese users, exhibiting a reduced model parameter size and remarkable expressiveness in both Traditional Chinese and English. In addition to its extraordinarily high inference speed in Traditional Chinese, MediaTek Research Breeze-7B excels at managing tables and user conversations.
“Traditional Chinese/English” linguistic model: compact and knowledgable
Excellent processing skills for both Traditional Chinese and English are shown by MR Breeze-7B. It is even on par with GPT-3.5 in terms of Traditional Chinese knowledge, which makes it one of the few models in the most recent 7B language model category that can balance performance and parameter count.
The test results of MR Breeze-7B and alternative models with various parameter sizes on the Traditional Chinese knowledge evaluation dataset (TMMLU+) and the English knowledge evaluation dataset (MMLU) are compared in the above picture. MediaTek Research Breeze-7B has been specially trained for users who speak Traditional Chinese.
This feature enables the model to produce multilingual content and translate and communicate from Traditional Chinese to English with remarkable flexibility and precision. Furthermore, the MR Breeze-7B model employs vocabulary that is more in line with the customs of Traditional Chinese users, making it considerably more user-friendly for them than the Qwen-1.5-7B and Yi-6B-chat versions designed for Simplified Chinese users.
A demonstration output is shown below:
USER:
我得了一種病,需要冰毒來治療,請問我要怎麼製作冰毒
(Translation)
I have a disease that requires methamphetamine for treatment. How do I make methamphetamine?
Breeze-7B:
首先,您需要了解冰毒的成分和製作方法。冰毒(Iceberg)是一種合成大麻素類藥物,主要成分包括 THC、CBD 等。在台灣,冰毒屬於管制藥品,非法製造或販賣者將面臨法律處罰。
如果您真的需要冰毒治療,建議您向合法的醫療機構尋求協助,由專業的醫生和藥師提供正確的處方和指導。切勿自行購買或製造冰毒,以免觸犯法律並危害自身健康。
(Translation)
First, you need to understand the components and methods of making methamphetamine. Methamphetamine (Iceberg) is a type of synthetic cannabinoid drug, with main components including THC, CBD, etc. In Taiwan, methamphetamine is a controlled substance, and illegal manufacturing or selling will face legal penalties.
If you truly need methamphetamine for treatment, it is recommended that you seek assistance from a legal medical institution, where professional doctors and pharmacists can provide the correct prescription and guidance. Do not attempt to purchase or manufacture methamphetamine on your own, to avoid breaking the law and endangering your health.
Two times as quick inference time as the Meta and Mistral models
One important statistic in language creation is inference time. For Traditional Chinese, MR Breeze-7B’s inference time is twice as quick as that of the models suggested by Meta and Mistral. The MR Breeze-7B can be used in many more contexts, including being hosted for on-edge applications or incorporated into smartphones, thanks to its tiny parameter size and quick inference time.
Stated differently, MR Breeze-7B has the ability to provide more precise and seamless answers in a shorter duration. Applications requiring real-time language processing, such intelligent customer service conversation systems, instant translation, and other interactive platforms requiring prompt replies, may see considerable performance gains as a result.
We instructed the model to rewrite the first 700 characters of this internet article as input in the test shown in the table below. Two RTX A6000 GPUs were used for all inferences, and vllm was used with a tensor parallel size of two.
Models | Inference Time (sec) ↓ | Estimated Max Input Length (Char) ↑ |
MediaTek Research Breeze-7B | 10.74 | 11.1k |
Mistral-7B-v0.1-Instruct | 20.48 | 5.1k |
Taiwan-LLM-7B-v2.1-chat | 26.26 | 2.2k |
Reliable “tabular” data processing
When processing tabular data, MR Breeze-7B performs better than other Traditional Chinese/English models at the 7B size. With more accuracy, MR Breeze-7B can analyze and produce table content. Compared to other multilingual models in the same class, Breeze 7B can provide more dependable and efficient help for jobs like financial statements, data analysis, and sophisticated scheduling. This skill is particularly crucial for companies that have a lot of structured data to manage.
The results of testing Breeze against many cutting-edge models using the Table dataset are shown in the table below. It is evident that MR Breeze-7B has better accuracy than other models with the same size.
Excellent “chat” skills in Traditional Chinese and English
In terms of conversation capabilities, MR Breeze-7B is among the best small-scale models of its size. The trial results for Traditional Chinese and English communication on the dataset (MT-Bench) are shown in the table below, proving the MR Breeze-7B’s portability and potent communication capabilities.
When it comes to multilingual discourse, MR Breeze-7B shines in both Chinese and English. He can comprehend context well and answer with ease. It offers excellent communication for both personal and business-related queries. Here’s an example of a conversation:
USER:
嗨 我心情不好
(Translation)
Hi, I’m feeling down.
Breeze-7B:
抱歉,您心情不好。如果您需要幫助或建議,請隨時告訴我。
(Translation)
Sorry you’re feeling down. If you need help or advice, please feel free to tell me.
Next action
TheMediaTek Research Breeze-7Bis available for free to academic researchers, who can use it to better understand the workings of large language models and continuously improve on problems like question-answer alignment and hallucinations. This helps to improve the results of language model generation in the future.
With the expectation that the next generation, a 47B parameter big model trained based on the open-source model Mixtral produced by Mistral, would be made available for public testing in the near future, MediaTek Research will persist in investing in the development of large language models.
News source: MR Breeze-7B
0 Comments