About Us | Introduction to TTS | Staff | Publications | Contact

About Us


IFLYTEK Research Center, the core technology R&D division of IFLYTEK, is dedicated to speech technology research, including speech synthesis, speech recognition and related areas. Our technologies are based on research originally conducted at the National Intelligent Computer R&D Center and the Human-Machine Speech Communication Laboratory at the University of Science and Technology of China (USTC). Currently, these award winning Technologies are leading the world.
We are also responsible for carrying out all national projects for the company.


Research History


1. Corpus-based Continuous Chinese-English Text-to-Speech Engine



The principal of this system: from a large recorded natural sound database, select appropriate units to synthesize continuous speech according to acoustic and phonetic algorithms. Development of this software involved advanced technologies in fields such as linguistics, phonetics, statistical analysis, artificial intelligence and digital signal processing. We have brought together cutting edge research in these areas in our development of near-human sounding synthesized speech apparatus.

This system is based on our advanced text analysis and phonetic hierarchy architecture. Utilizing our data statistic and intelligent prosody model, it can produce high-quality synthesized speech with more clarity, more naturalness and more fluency than any other current method. In addition, it possesses a prosody tuning function that is based on the improved PSOLA algorithm and works well on speech tempo adjusting.


Special attention had been given to fluent code switching between Chinese and English. A single voice for both Chinese and English reads mixed texts of Chinese and English more naturally than other system.


2. Miniature Text-to-Speech Engines for embedded applications


This is the smallest continuous Chinese TTS system currently in existence. It can synthesize high-quality speech at a minimal cost to hardware, especially storage, making the low cost implementation of speech technology a reality.

We have adopted a simple but highly efficient synthesis algorithm in this system, allowing fast operation and less CPU and storage consumption. It can be easily used in palm devices (PDA, mobile phone, etc), intelligent toys, intelligent home appliances, car navigation terminals, communication devices and so on.



3. English TTS


We have completed a comprehensive English TTS system. This system is keeping pace with world leaders in English speech synthesis. Phonetic marking of English text has been mastered, complementing our current Chinese-English text blending service.



4. Distributed Speech Computing – Distributed Speech Synthesis


Aiming towards the fast growing wireless data & application market, speech recognition & synthesis will be vital technology ingredients in the enhancement of wireless data application, giving a superb user experience. Our distributed computing model is a perfect fit to the next generation mobile communication network (GPRS, 3G), taking advantage of both clients’ growing computing power and increasing network bandwidth.

Our team cooperated with Intel on this technology and has developed patented DSS prototype. The DSS system is composed of Text Processing Server, CSSML Generator, CSSML Parser, Speech Synthesizer. The relationship and connection between components are as followed:

Our cooperation with Intel on this technology has resulted in the development of a patented DSS prototype. The DSS system is composed of a Text Processing Server, CSSML Generator, CSSML Parser, and a Speech Synthesizer. The relationship and connection between components are as follows:



5. Embedded automatic speech recognition


We focused on efforts to build an embedded speech recognition system. Three specific areas are under research. First is acoustic modeling for embedded ASR. It involves several areas, including discriminative training, fast likelihood computing, and small model unit selecting. The second is robust front-end feature. We build our experiments on AURORA database, performance of our methods being comparable to that of ETSI Advanced Front End (AFE) standard. However, our methods are much less complex than that of ETSI AFE. In order to build a real ASR application, we need to research on confidence measures, which is our third research direction. It is used to judge whether the word recognized is valid. The confidence measure has excellent performance on our embedded ASR. All the above three research areas work together to enable we build ASR on resource limited embedded platform.



6. Robocup


RoboCup is an exciting contest that has arisen in recent years. It represents the highest level of artificial intelligence in the computer field. RoboCup virtual commentator system is based on the action-to-speech concept and realized our aim that “machines play, machines commentate”. Aiding our research in artificial intelligence, we collaborated with the Artificial Intelligence Center, USTC, to produce this system. It embodies our research results in status analysis, natural language production and top of the range speech synthesis in specific areas.



7. Virtual Reporter


This system is based on collaboration with the National High Performance Computing Center, USTC. To implement this system, we have completed extensive research in multi-model user interfaces, random text continuous natural speech synthesis with, human head 3D modeling, lip and expression synchronization technologies and so on. It’s the first content-oriented Chinese virtual reporter system. It can broadcast all content from various information sources in real-time as a virtual reporter.

Virtual reporter will be useful in news and weather reports, stock commentary and a wide range of other fields.





more....


Copyright 2003 ANHUI USTC iFLYTEK Co., Ltd.