Huawei Talent AI HCIA Compiled Textbook

  • Uploaded by: Jonafe Piamonte
  • Size: 12.5 MB
  • Type: PDF
  • Words: 84,882
  • Pages: 313
Report this file Bookmark

* The preview only shows a few pages of manuals at random. You can get the complete content by filling out the form below.

The preview is currently being created... Please pause for a moment!

Description

01 AI Overview (Textbook)

2

02 Python Basics (Textbook)

43

03 Machine Learning (Textbook)

78

04 Deep Learning (Textbook)

124

05 Deep Learning Open-Source Framework MindSpore (Textbook )

171

06 AI Computing Platform Atlas (Textbook)

201

07 AI Development Platform for Smart Devices (Textbook)

258

08 Enterprise Smart Application Platform (Textbook)

278

Huawei AI Academy Training Materials

AI Overview

Huawei Technologies Co., Ltd.

01 AI Overview (Textbook)

2

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees, or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express, or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang, Shenzhen 518129 China

Website:

https://e.huawei.com

01 AI Overview (Textbook)

3

AI Overview

Page 1

Contents 1 AI Overview ......................................................................................................................................... 3 1.1 AI Overview ................................................................................................................................................................................ 3 1.1.1 AI in the Eyes of the Public ............................................................................................................................................... 3 1.1.2 What Is AI? .............................................................................................................................................................................. 4 1.1.3 Relationship of AI, Machine Learning, and Deep Learning ................................................................................... 6 1.1.4 Types of AI ............................................................................................................................................................................... 6 1.1.5 AI History ................................................................................................................................................................................. 7 1.1.6 Three Schools of Thought: Symbolism, Connectionism, and Behaviorism ....................................................12 1.2 Overview of AI Technologies..............................................................................................................................................13 1.2.1 Overview ................................................................................................................................................................................13 1.2.2 Application Layer ................................................................................................................................................................13 1.2.3 Algorithm Layer ...................................................................................................................................................................14 1.2.4 Chip Layer ..............................................................................................................................................................................14 1.2.5 Device Layer ..........................................................................................................................................................................14 1.2.6 Process Technology Layer ................................................................................................................................................15 1.2.7 Deep Learning Frameworks ............................................................................................................................................15 1.2.8 AI Processor Overview.......................................................................................................................................................15 1.2.9 AI Industry Ecosystem .......................................................................................................................................................19 1.2.10 HUAWEI CLOUD EI Application Platform ...............................................................................................................22 1.3 Technical Fields and Application Fields of AI ...............................................................................................................24 1.3.1 AI Technology Direction ...................................................................................................................................................24 1.3.2 AI Application Field ............................................................................................................................................................29 1.3.3 Phases of AI...........................................................................................................................................................................32 1.4 Huawei's AI Strategy .............................................................................................................................................................32 1.4.1 Huawei's Full-Stack, All-Scenario AI Portfolio .........................................................................................................32 1.4.2 Huawei AI Full-Stack Direction ......................................................................................................................................33 1.5 AI Disputes ................................................................................................................................................................................35 1.5.1 Algorithmic Bias ..................................................................................................................................................................35 1.5.2 Privacy Issues ........................................................................................................................................................................36 1.5.3 Contradiction Between Technology and Ethics .......................................................................................................36 1.5.4 AI Development = Rising Unemployment? ...............................................................................................................36 1.6 AI Development Trend .........................................................................................................................................................37 1.6.1 Development Trend of AI Technologies .....................................................................................................................37 1.6.2 GIV 2025 — 10 Trends for 2025 ...................................................................................................................................38

01 AI Overview (Textbook)

4

AI Overview

Page 2

1.7 Summary ...................................................................................................................................................................................39 1.8 Quiz .............................................................................................................................................................................................39

01 AI Overview (Textbook)

5

Page 3

AI Overview

1

AI Overview

In the wave of Internet development, the emergence and rise of artificial smart (AI) is undoubtedly an extremely important part. With the continuous sinking of AI technologies, this technical concept is more and more connected with human life. Since the 1950s, with the development of related fields and the leap of software and hardware conditions, AI has been applied on a large scale in nearly a decade after several ups and downs. This chapter describes the concept, development history, and existing problems of AI.

1.1 AI Overview 1.1.1 AI in the Eyes of the Public Person get to know AI through news, movies, and actual applications in daily life. What is AI in the eyes of the public?

Figure 1-1 AI in the eyes of the public As shown in Figure 1-1, the news reports AI with exaggerated titles. In movies, virtual AI was built with rich imagination. In person's daily life, AI makes it more convenient while brings privacy concerns. "The branch of computer science concerned with making computers behave like humans." — A popular definition of AI, and an earlier one in this field proposed by John McCarthy at the Dartmouth Conference in 1956. However, it seems that this definition ignores the possibility of strong AI. According to another definition, AI is the smart (weak AI) demonstrated by artificial machines. The following are the opinions of some scholars on AI: "I propose to consider the question, 'Can machines think?'"

01 AI Overview (Textbook)

6

Page 4

AI Overview

— Alan Turing in 1950 "The branch of computer science concerned with making computers behave like humans." — John McCarthy in 1956 "The science of making machines do things that would require smart if done by men." — Marvin Minsky in 1972

1.1.2 What Is AI? Let's first understand what smart is before learning what AI is. According to the theory of multiple smarts, human smart can be divided into seven categories: verbal/linguistic, logical/mathematical, visual/spatial, bodily/kinesthetic, musical/rhythmic, Inter-personal/social, and introspection Intrapersonal/Introspective.

1.1.2.1 Linguistic Smart It refers to the ability to express thoughts and understand others by using oral speeches or in written words, and to master speech, semantics, and grammar flexibly, with the ability to think in words, express in words as well as appreciate the deep meaning of languages. Ideal professions for person with this smart include political activists, presenters, lawyers, orators, editors, writers, journalists, and teachers.

1.1.2.2 Logical-Mathematical Smart It refers to the ability to calculate, measure, infer, conclude, classify, and to carry out complex mathematical operations. This smart includes sensitivity to logical ways and relationships, statements and propositions, functions, and other related abstract concepts. Ideal professions for person mastering logical mathematical smart include scientists, accountants, statisticians, engineers, and computer software developers.

1.1.2.3 Spatial Smart It refers to the ability to accurately perceive the visual space and surroundings and to present the perception in the form of graphics. Person with this smart are sensitive to colors, lines, shapes, forms, and spatial relationships. Ideal professions for person mastering spatial smart include interior designers, architects, photographers, painters, and pilots.

1.1.2.4 Bodily-Kinesthetic Smart It refers to the ability to express thoughts and emotions with the whole body and to make or operate objects with hands flexibly. This smart includes special physical skills such as balance, coordination, agility, strength, elasticity and speed, and abilities triggered by tactile sensation. Ideal professions for person mastering bodily-kinesthetic smart include athletes, actors, dancers, surgeons, gemstones, and mechanics.

1.1.2.5 Musical Smart It refers to the ability to perceive pitches, tones, rhythms, and timbres. Person with this smart are sensitive to rhythms, tones, melodies or timbres, and endowed with the gift of music, with a strong capability to perform, create, and think about music. Ideal

01 AI Overview (Textbook)

7

AI Overview

Page 5

professions for person with musical smart include singers, composers, conductors, music critics, musicians.

1.1.2.6 Interpersonal Smart It refers to the ability to understand and interact with others. Person with this smart are good at perceiving other person's moods, emotions, and feelings, and able to discern and respond appropriately to the cues of different relationships. Ideal professions for person with interpersonal smart include politicians, diplomats, leaders, counselors, public relations and marketing personnel.

1.1.2.7 Intrapersonal Smart It refers to self-awareness and the ability to act appropriately based on self-awareness. Person with this smart can recognize their strengths and weaknesses, their inner hobbies, emotions, intentions, tempers and self-esteem, and prefer thinking independently. Ideal professions for person with intrapersonal smart include philosophers, politicians, thinkers, psychologists. AI is a new technical science that studies and develops theories, methods, techniques, and application systems for simulating and extending human smart. In 1956, the concept of AI was first proposed by John McCarthy, who defined the subject as "science and engineering of making intelligent machines, especially intelligent computer program". The purpose of AI is to make machines intelligent and give them human thoughts. As shown in Figure 1-2, the connotation of AI so far has greatly expanded and has become an interdisciplinary course.

Figure 1-2 AI discipline category Machine learning can be understood from multiple aspects. Tom Mitchell, a global machine learning scientist, provided a widely quoted definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E." These definitions are simple and abstract. However, as we deepen our understanding of machine learning, we will find that the connotation and extension of machine learning are changing over time. Because a variety of fields and applications are involved and

01 AI Overview (Textbook)

8

AI Overview

Page 6

machine learning develops rapidly, it is not easy to define machine learning simply and clearly. In general knowledge, processing systems and algorithms of machine learning are an identification mode that performs prediction by finding a hidden mode in data. Machine learning is an important subfield of AI, which also intersects with Data Mining (DM) and Knowledge Discovery in Database (KDD).

1.1.3 Relationship of AI, Machine Learning, and Deep Learning Figure 1-3 shows the relationship among them.

Figure 1-3 Relationship of AI, Machine Learning, and Deep Learning Machine learning is specialized in studying how computers simulate or implement human learning behavior to acquire new knowledge or skills. The concept of Deep Learning originates from the research on Artificial Neural Networks (ANNs). Deep learning is a new field in machine learning that simulates the human brain to interpret data such as images, sounds, and texts. Among the three, machine learning is a way or subset of AI, and deep learning is a special type of machine learning. AI can be compared to the brain. Machine learning is a process of mastering cognitive capabilities, and deep learning is an efficient teaching system in this process. AI is the goal and the result. Deep learning and machine learning are methods and tools.

1.1.4 Types of AI AI can be classified into strong AI and weak AI. The strong AI view holds that it is possible to create intelligent machines that can really reason and solve problems. Such machines are considered to be conscious and selfaware, can independently think about problems and work out optimal solutions to

01 AI Overview (Textbook)

9

AI Overview

Page 7

problems, have their own system of values and world views as well as the instinct of living things, such as the needs for survival and safety. In a sense, the machine with human thoughts can be regarded as a new civilization. The weak AI view holds that intelligent machines cannot really reason and solve problems. These machines only look intelligent, but do not have real smart or selfawareness. Now we are in the weak AI phase. The emergence of weak AI alleviates the burden of human intellectual work, and its production principle is similar to that of advanced bionics. Both AlphaGo and robots that can write press releases and novels fall in the weak AI phase because they are better than humans only in some ways. The roles of data and computing power are self-evident in the era of weak AI, and promote the commercialization of AI. In the era of strong AI, these two factors are still critical. At the same time, the research on quantum computing by technology giants like Google and International Business Machines Corporation (IBM) also provides powerful support for humans to enter the era of strong AI.

1.1.5 AI History 1.1.5.1 Overview of AI Development

Figure 1-4 Brief development history of AI Figure 1-4 shows the development history of AI. The official origin of AI can back to the Turing Test proposed by Alan Mathison Turing, the father of AI, in 1950. As he envisioned, a computer is intelligent if it can talk to humans without being identified. In the same year, Turing boldly predicted the feasibility of a truly intelligent machine. However, no computer has completely passed the Turing Test so far. Although the concept of AI has only a few decades of history, its theoretical basis and supporting technologies have been developed for a long time. The prosperity of the AI field is the result of common development of various disciplines and accumulation of generations of scientific circles.

01 AI Overview (Textbook)

10

AI Overview

Page 8

1.1.5.2 Germination (Before 1956) The earliest theoretical basis of AI can be back to the 4th century B.C. The famous ancient Greek philosopher and scientist Aristotle put forward the formal logic. His syllogism is still an indispensable foundation for deductive reasoning. In the 17th century, German mathematician Gottfried Wilhelm Leibniz put forward the idea of universal character and inference calculation, which laid the foundation for the generation and development of mathematical logic. In the 19th century, George Boole, a British mathematician, proposed Boolean algebra, which was the basic operation mode of computers and enabled the building of computers. Charles Babbage, the British inventor, designed a difference engine at the same time, the first computer to compute a quadratic polynomial. Although it had limited functions, it was the first time the computer really had reduced the computational pressure of the human brain. Machines began to have computational smart. In 1945, John Mauchly and J. Presper Eckert of the Moore Group made Electronic Numerical Integrator and Computer (ENIAC), the world's first general-purpose digital computer. Although ENIAC was a milestone achievement, it still had many fatal drawbacks: large size, high power consumption, and manual input and adjustment of commands. In 1947, John von Neumann, the father of computer, designed and manufactured Mathematical Analyzer Numerical Integrator and Computer Model (MANIAC), a truly modern electronic computer device, by adapting and upgrading the device. In 1946, American physiologist W. McCulloch built the first neural network model. His research on microcosmic AI laid an important foundation for the development of neural networks. In 1949, Donald O. Hebb put forward a neuropsychological learning paradigm, the Hebbian learning theory, which described the basic principle of synaptic plasticity. Synaptic plasticity is the continuous and repeated stimulation of presynaptic neurons to postsynaptic neurons that can lead to the increase of synaptic transmission efficiency. It has provided a theoretical basis for the establishment of the neural network model. In 1948, Claude E. Shannon, the father of information theory, put forward the concept of "information entropy". By referring to the concept of thermodynamics, Claude E. Shannon defined the average amount of information excluding redundant information as "information entropy". This concept has had a far-reaching impact and played an extremely important role in areas such as non-deterministic inference and machine learning.

1.1.5.3 First Development (1956–1974) At the Dartmouth Conference that lasted two months in 1956, AI was formally proposed by John McCarthy as a new discipline. This marked the birth of AI. After this conference, several AI research organizations were formed in the United States, such as the CarnegieRAND collaboration group of Allen Newell and Herbert Alexander Simon, the Massachusetts Institute of Technology (MIT) research group of Marvin Lee Minsky and McCarthy, and Arthur Samuel's IBM Engineering Research Group. In the next two decades, AI has developed rapidly in various fields. Researchers have been expanding the application areas of AI technologies with great enthusiasm.

01 AI Overview (Textbook)

11

AI Overview

Page 9

1.1.5.3.1 Machine Learning In 1956, Arthur Samuel of IBM wrote a famous checker program, which could learn an implicit model through the checkerboard state to guide the next move. After games with the program, Samuel believed that the program could reach a very high level after a certain period of learning. By using this program, Samuel rejected the model that computers could not learn patterns beyond explicit codes like humans. Since then, he has defined and explained a new word — machine learning.

1.1.5.3.2 Pattern Recognition In 1957, Zhou Shaokang proposed to solve the pattern recognition problem by using the statistical decision theory, which promoted the rapid development of pattern recognition research from the late 1950s. In the same year, Frank Rosenblatt put forward a simplified mathematical model of simplified human brain stimulation for recognition, that is, perceptron. It initially implemented the training of the recognition system based on each sample a given category so that the system was able to correctly classify patterns of other unknown categories after learning.

1.1.5.3.3 Pattern Matching In 1966, ELIZA, the first chat program, was developed by the Institute of Artificial Smart of the MIT. It can match patterns according to the set rules and users' questions, and select proper answers from the pre-written answer database. It was also the first software program that attempted to pass the Turing Test. ELIZA once simulated a psychotherapist talking to a patient and cheated many person when it was first used. "Dialogs are pattern matching." This is the beginning of computer natural language dialog technology. In addition, during the first development of AI, McCarthy developed the list processing (LISP) programming language, which became the most important programming language in the AI field in the next several decades. Minsky had a more in-depth study of neural networks and found the shortcomings of simple neural networks. To overcome the limitations of neural networks, multilayer neural networks and back propagation (BP) algorithms have emerged. The expert system also started. The first industrial robot entered the production line of General Motors, and the first mobile robot capable of autonomous movement appeared. The development of related fields also greatly promoted the progress of AI. The bionics established in the 1950s stimulated the enthusiasm of scholars for research so that simulated annealing algorithm came into being. It is a heuristic algorithm, the research foundation of search algorithms such as the ant colony optimization algorithm.

1.1.5.4 First Winter (1974–1980) However, person's enthusiasm for AI did not last for a long time, and optimistic promises could not be fulfilled in a timely manner, causing doubts about AI technologies around the world. The perceptron that caused a sensation in academia in 1957 was hit hard in 1969. At that time, Minsky and other scientists put forward the famous XOr problem and demonstrated the limitation of the perceptron under the linear inseparable data similar to the XOr problem. For academia, the XOr problem has almost become an insurmountable divide.

01 AI Overview (Textbook)

12

AI Overview

Page 10

In 1973, AI was questioned by the scientific community. Many scientists thought that the seemingly ambitious goals of AI could not be achieved and that the research had completely failed. Increasing suspicions led to severe criticism and questioning of the real value of AI. Subsequently, governments and institutions have stopped or reduced their investment, and AI fell into its first winter in the 1970s. The setback that AI encountered this time was not a coincidence. Limited by the computing capability at that time, many problems could be solved theoretically, but could not be put into actual use. At the same time, it was difficult to acquire knowledge for the algorithms of expert system at that time, leading to the failure of many projects. Researches on machine vision have started in the 1960s. The methods proposed by American scientist L. R. Roberts, such as edge detection and contour composition, are classic and have been widely used until now. However, theoretical foundations did not necessarily lead to actual output. At that time, scientists calculated that at least 1 billion instructions needed to be executed to simulate human retina vision for a computer. In 1976, the world's fastest computer Cray-1 cost millions of dollars, but the speed was less than 100 million times per second, and the computing speed of a common computer was less than 1 million times per second. Hardware conditions limited the development of AI. In addition, another major foundation for AI development is the huge database. At that time, computers and the Internet were not popularized, so large-scale data could not be obtained at all. In this phase, the development of AI slowed down. Although the idea of BP was proposed by Linnainmaa in the 1970s as an "automatic differential reverse model", it was applied by Werbos to the multilayer perceptron until 1981. The emergence of multilayer perceptron and BP algorithm contributed to the second development of neural networks. In 1986, D.E.Rumelhart and others successfully implemented an effective BP algorithm for training a multilayer perceptron, which had a far-reaching impact.

1.1.5.5 Second Development (1980–1987) In 1980, the XCON developed by Carnegie Mellon University was officially put into use. XCON was a comprehensive expert system that contained more than 2500 preset rules. In the following years, XCON has processed more than 80,000 orders with an accuracy of over 95%. This was a milestone in the new era. The expert system began to play a powerful role in specific fields and brought the entire AI technology into a prosperous phase. The expert system tends to focus on a single area of expertise, simulating human experts to answer questions or provide knowledge to help staff make decisions. It limits itself to a small scope so that it avoids the difficulties of general AI and fully uses the knowledge and experience of existing experts to resolve tasks in specific work fields. Because of the huge business success of XCON, 60% of the Fortune 500 companies began to develop and deploy their own expert systems in the 1980s. According to statistics, more than USD1 billion was invested in the AI field from 1980 to 1985, most of which was used in the AI department of enterprises, and many AI software and hardware companies emerged. In 1986, the Bundeswehr University Munich installed computers and sensors in a Mercedes-Benz van that automatically controlled its steering wheel, accelerator and brake. It is called VaMoRs and is the first self-driving car.

01 AI Overview (Textbook)

13

AI Overview

Page 11

In the AI field, the LISP language was mainly used at that time. To improve the transportation efficiency of various programs, many organizations began to develop specific computer chips and storage devices for running LISP programs. Although LISP machines have made some progress, personal computers (PCs) have been rising at the same time. IBM PCs and Apple computers occupied the entire computer market rapidly. Their central processing unit (CPU) frequency and speed were steadily increasing, even becoming more powerful than those expensive LISP machines.

1.1.5.6 Second Winter (1987–1993) In 1987, the hardware market of specific LISP machines collapsed, and the AI field entered a cold winter again. The collapse of the hardware market and the fact that governments and institutions have stopped investment in AI researches have led to a trough in this field for several years, but some important achievements have also been made. In 1988, the U.S. scientist Judea Pearl introduced the probability statistics into the inference process of AI, which greatly impacted the development of AI. Nearly 20 years after the second winter, AI technologies have been deeply integrated with computer and software technologies. On the other hand, the progress of AI algorithm theory was slow. Many researchers could achieve groundbreaking results based on the theories of the past simply by relying on more powerful and faster computer hardware.

1.1.5.7 Stable Development (1993–2011) In 1995, Richard S. Wallace developed Alice, a new chatbot program inspired by ELIZA. It could use the Internet to continuously add its own data sets and optimize content. In 1996, IBM's Deep Blue computer played human world chess champion Kasparov, but did not win. Kasparov believed that the computer could never win the match against humans. After that, IBM upgraded Deep Blue. The reconstructed Deep Blue has 480 specific CPUs, doubling the computing speed with 200 million times per second. It could predict the next eight moves or more and beat Kasparov. However, this milestone match is actually a victory achieved by computers in the game with clear rules based on computing speed and enumeration. It is not AI in the real sense. In 2006, Geoffrey Hinton published a paper in Science, opening the era of deep learning.

1.1.5.8 Prosperity (2011–present) In 2011, Watson, also from IBM, participated in the variety show Jeopardy! and competed with humans. Watson beat two human champions with its outstanding natural language processing capability and powerful knowledge base. Computers at this stage were able to understand human languages, marking a major progress in the AI field. In the 21st century, with the explosive growth of mobile Internet and cloud computing technologies and the wide use of PCs, institutions have accumulated unprecedented data volumes, providing sufficient materials and driving for the future development of AI. Deep learning became the mainstream of AI technologies. The famous Google Brain identity recognition project greatly improved the ImageNet recognition rate to 84%.

01 AI Overview (Textbook)

14

AI Overview

Page 12

The Semantic Web was proposed in 2011, with its concept originated from the World Wide Web. Essentially, it was a massive distributed database with web data as the core and was linked by means of machine understanding and processing. The emergence of Semantic Web greatly promoted the development of knowledge representation technologies. In 2012, Google launched a search service based on knowledge graphs and proposed the concept of knowledge graphs for the first time. In 2016 and 2017, Google launched matched between humans and the machine that caused a sensation to the world. Its AI program AlphaGo defeated two Go world champions: Lee Sedol from South Korea and Ke Jie from China. Today, AI has penetrated into every aspect of human life. The voice assistant, represented by Apple's Siri, uses the Natural Language Processing (NLP) technology. Supported by NLP technology, computers can process human natural languages and match them with desired instructions and responses in an increasingly natural way. When browsing shopping websites, users often receive product recommendations generated by the recommendation algorithm. The recommendation algorithm can predict the products that users may purchase by analyzing historical shopping data and users' preference expressions.

1.1.6 Three Schools of Thought: Symbolism, Connectionism, and Behaviorism 1.1.6.1 Symbolism The basic idea of symbolism is that the cognitive process of human beings is the process of inference and operation of various symbols. A human being is a physical symbol system, and so is a computer. Computers, therefore, can be used to simulate intelligent behavior of human beings. The core of AI lies in knowledge representation, knowledge inference, and knowledge application. Knowledge and concepts can be represented with symbols. Cognition is the process of symbol processing while inference refers to the process of solving problems by using heuristic knowledge and search. Symbolism lies in inference, symbolic inference and machine inference.

1.1.6.2 Connectionism The basic idea of connectionism is that the basis of thinking is neurons instead of symbolic processing. Human brains vary from computers. A computer working mode based on connectionism is proposed to replace the one based on symbolic operation. Connectionism is derived from bionics, especially the study of the human brain model. In connectionism, a concept is represented by a set of numbers, vectors, matrices, or tensors. The concept is represented by the specific activation mode of the entire network. Each node, without specific meaning, plays its role in the representation of the concept. For example, in symbolism, the concept of a cat may be represented by a "cat node" or a set of nodes representing the cat's attributes, such as "two eyes", "four legs", and "fluffy". However, in connectionism, each node does not represent a specific concept, so it is impossible to find a "cat node" or an "eye neuron". Connectionism is based on neural networks and deep learning.

01 AI Overview (Textbook)

15

AI Overview

Page 13

1.1.6.3 Behaviorism The basic idea of behaviorism is that smart depends on perception and action, so the "perception-action" model of intelligent behavior is proposed. Smart requires no knowledge, representation, or inference. AI can evolve like human smart. Intelligent behavior can only be demonstrated in the real world through the constant interaction with the surrounding environment. Behaviorism concerns more about application practices and how to learn from the environment continuously to make corrections. Behaviorism is based on behavioral control, adaptation and evolutionary computing.

1.2 Overview of AI Technologies 1.2.1 Overview As shown in Figure 1-5, AI technologies are multi-layered, covering the application, algorithm mechanism, toolchain, device, chip, process, and material layers.

Figure 1-5 Overview of AI technologies On one hand, the rapid development of applications and algorithms, especially deep learning and convolutional neural networks, raises performance optimization requirements for AI chips by two to three orders of magnitude, which has triggered the upsurge of AI chip R&D in recent years. On the other hand, the rapid development of new materials, processes, and components, such as 3D stacked memory and process evolution, also made significant improvements in performance and power consumption of AI chips possible. This driving came from breakthroughs in basic research. In general, the above driving have empowered rapid advancement of AI chip technologies in recent years. At each technology level, the followings are the achievements that AI technologies have made.

1.2.2 Application Layer Video and image: facial recognition, object detection, image generation, video analysis, video content moderation, image beautification, reverse image search, AR Voice: speech recognition, speech synthesis, voice wakeup, voiceprint recognition, music generation, smart speaker, smart navigation

01 AI Overview (Textbook)

16

AI Overview

Page 14

Text: text analysis, language translation, man-machine dialog, reading comprehension, recommendation system Control: autonomous driving, drone, robot, industrial automation

1.2.3 Algorithm Layer Neural network interconnection structure: multilayer perceptron (MLP), convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) network, and spiking neural network (SNN) Deep neural network (DNN) structure: AlexNet, ResNet, and VGGNet Neural network algorithms: transfer learning, reinforcement learning, one-shot learning, adversarial learning, neural Turing machine, and spike-timing-dependent plasticity (STDP) Machine learning algorithms: support vector machine (SVM), k-nearest neighbor, Bayesian theorem, decision tree, hidden Markov model, AdaBoost, Bidirectional Encoder Representations from Transformers (BERT)

1.2.4 Chip Layer Algorithm optimization chip: Efficiency optimization, low power consumption optimization, high-speed optimization, and flexibility optimization, such as deep learning accelerators and facial recognition chips Neuromorphic chip: bionic brain, biological brain inspiration, brain mechanism simulation Programmable chip: considering flexibility, programmability, algorithm compatibility, and compatibility with general software, such as digital signal processors (DSP), graphic processing unit (GPU), and field programmable gate array (FPGA) Chip system-level structure: multi-core, many-core, Single Instruction Multiple Data (SIMD), operation array structure, memory structure, network-on-chip structure, multichip interconnection structure, memory interface, communication structure, and multilevel cache Development tool chain: programming framework (TensorFlow, Caffe, and MindSpore), compiler, simulator, optimizer (quantization and tailoring), and atomic operation (network) library

1.2.5 Device Layer High-bandwidth off-chip memory: high bandwidth memory (HBM), dynamic random access memory (DRAM), high-speed graphics double data rate (GDDR), low-power double data rate (LPDDR), and spin-transfer torque RAM (STT-MRAM) High-speed interconnection: SerDes, optical interconnection communication Bionic devices (artificial synapses, artificial neurons): memristors New computing components: analog computing and in-memory computing

01 AI Overview (Textbook)

17

AI Overview

Page 15

1.2.6 Process Technology Layer On-chip memory (synaptic array): distributed static RAM (SRAM), resistive RAM (ReRAM), phase change RAM (PCRAM) Complementary metal-oxide-semiconductor (CMOS) technology: process node (16, 7, 5 nm) CMOS multilayer integration: 2.5D IC/SiP, 3D-stack technology, monolithic 3D New technologies: 3D NAND, flash tunneling field effect transistors (FETs), ferroelectric FETs (FeFETs) and fin FETs (FinFETs).

1.2.7 Deep Learning Frameworks The emergence of the deep learning framework lowers the threshold for getting started. You do not need to start coding from complex neural networks and BP algorithms. Instead, you can use hyperparameters of the configuration model as required. The parameters of the model are obtained through automatic training. Moreover, you can add self-defined network layers to the existing models, or select required classifiers and optimization algorithms. A deep learning framework can be regarded as a set of building blocks. Each component in the building blocks is a model or algorithm. Therefore, developers can use components in the building blocks to assemble models that meet requirements, and do not need to start from scratch.

1.2.8 AI Processor Overview This section describes AI processor overview, AI processor classification, AI processor status quo, comparison of mainstream AI processors, and Ascend AI Processors overview. AI has four elements: data, algorithm, scenario, and computing power. The computing power depends on the AI processor. AI processors, also known as AI accelerators, are function modules used to process massive computing tasks in AI applications.

1.2.8.1 AI Processor Classification AI processors can be classified by technical architectures and service applications. AI processors can be divided into four types by technical architectures: 

CPU: It is a super-large-scale integrated circuit, the computing core and control unit of a computer. A CPU can interpret computer instructions and process computer software data.



GPU: It is also known as display core, visual processor, and display chip. It is a microprocessor that processes images on PCs, workstations, game consoles, and some mobile devices such as tablets and smart phones.



Application specific integrated circuit (ASIC): It is an integrated circuit designed for a specific purpose.



FPGA: It is designed to implement functions of a semi-customized chip, that is, the hardware structure can be flexibly configured and changed in real time according to requirements.

From the perspective of service applications, there are two types: training and inference.

01 AI Overview (Textbook)

18

AI Overview

Page 16



In the training phase, a complex DNN model needs to be trained through a large number of data inputs or an unsupervised learning method such as enhanced learning. The training process requires massive training data and a complex DNN structure. The huge computing amount requires ultra-high performance including computing power, precision, and scalability of processors, Common GPUs include NVIDIA GPUs, Google tensor processing units (TPUs), and Huawei neural-network processing units (NPUs).



In the inference phase, inferences are made by using trained models and new data. For example, a device uses the background DNN model to recognize a captured face. Although the calculation amount of the inference is much less than that of training, a large number of matrix operations are involved. In the inference process, GPU, FPGA and ASIC are also useful.

1.2.8.2 Status Quo of AI Processors 1.2.8.2.1 CPU The performance of early computers was improved mainly by Moore's Law. Person impose increasingly high requirements on computer performance, while performance improvement mostly depends on advancement of underlying hardware, which accelerates upper-layer application software. In recent years, improvement brought by the Moore's Law has slowed down. Hardware development gradually encounters physical bottlenecks. Limits on heat dissipation and power consumption make it difficult to further improve the performance of serial programs in the traditional CPU architecture. The current situation drives the industry to constantly look for an architecture and the corresponding software framework more suitable to the post-Moore's Law era. Multi-core processors are developed to improve computer performance by increasing the number of cores. Multi-core processors better meet the hardware requirements of software. For example, Intel® Core® i7 series processors use the parallel instruction processor cores constructed by four independent kernels based on the x86 instruction set. This improves the processor running speed to some extent, but also increases the power consumption and cost. The number of kernels cannot increase infinitely, and most traditional CPU programs are written by serial programming. Therefore, a large number of programs cannot be accelerated. In addition, AI performance can be improved by adding instructions (modifying the architecture). For example, Intel (complex instruction set computer architecture) adds instructions such as AVX-512, and adds the vector computing module (FMA) to the arithmetic logic unit (ALU) computing module. Advanced reduced instruction set computing machine ARM (reduced instruction set computer architecture) is added to the Cortex-A instruction set and is planned to be upgraded continuously. The performance can also be increased by frequency, but the improvement space is limited. In addition, a high dominant frequency may cause excessive power consumption and overheating of the processor.

1.2.8.2.2 GPU CPUs focus on logic control in instruction execution, while GPUs have outstanding advantages in large-scale, intensive, and parallel data computing. Program optimization requires collaboration of CPUs and GPUs.

01 AI Overview (Textbook)

19

AI Overview

Page 17

GPUs deliver remarkable performance in matrix computing and parallel computing and play a key role in heterogeneous computing. It was first introduced to the AI field as an acceleration chip for deep learning. At present, the GPU ecosystem has matured. NVIDIA inherits the GPU architecture and focuses on three aspects in deep learning scenarios: 1. Enriched the ecosystem: It launched the accelerated NVIDIA CUDA® Deep Neural Network library (cuDNN) for deep learning to improve its usability and optimize the GPU underlying architecture. 2. Improved customization: Multiple data types, such as INT8, are supported in addition to FP32. 3. Dedicated deep learning modules (such as TensorCore V100, an improved architecture with tensor cores) are added. The main problems of GPUs include high costs, low energy consumption, and high input and output latency.

1.2.8.2.3 TPU Since 2006, Google has sought to apply the design concept of ASICs to the neural network field and released the TPU, a customized AI processor that supports TensorFlow, an open-source deep learning framework. The TPUs use large-scale systolic arrays and large-capacity on-chip storage to efficiently accelerate the most common convolutional operations in the deep neural network (DNN). The systolic arrays can be used to optimize matrix multiplication and convolutional operations to provide higher computing power and lower energy consumption.

1.2.8.2.4 FPGA Using the hardware description language (HDL) programmable mode, FPGAs are highly flexible, reconfigurable and re-programmable, and customizable. Multiple FPGAs can be used to load the DNN model on the chips to realize low-latency computing. FPGAs outperform GPUs in terms of computing performance. However, the optimal performance cannot be achieved due to continuous erasing and programming. In addition, redundant transistors and cables, logic circuits with the same functions occupy a larger chip area. Thanks to the reconfigurable structure, the supply and Research and development (R&D) risks are low. The cost is relatively free depending on the purchase quantity. The design and tape-out processes are decoupled. The development period is long, generally half a year. The entry barrier is high.

1.2.8.3 Design Comparison of GPUs and CPUs GPUs are designed for massive data of the same type independent from each other and pure computing environments that do not need to be interrupted. CPUs are required to process different data types in a universal manner, perform logic judgment, and introduce massive branch jumps and interrupt processing, as shown in Figure 1-6.

01 AI Overview (Textbook)

20

Page 18

AI Overview

Figure 1-6 Structure comparison between CPUs and GPUs Each GPU comprises several large-sized parallel computing architectures consisting of thousands of smaller cores designed to handle multiple tasks simultaneously. A CPU is composed of several cores optimized for sequential serial processing. GPUs are designed based on large throughput. There are many ALUs and few caches, different from the objective of CPU, to improve the service for the thread. Caches are combined to access the DRAM, which causes the latency problem. The controller unit performs combined access. A large number of ALUs implement a large number of threads to mask the delay issue. CPUs are designed based on low latency. A CPU has powerful ALU and can complete computing in a short clock cycle. A large number of caches can reduce the latency. The clock frequency is high. With complex logic controller units, the latency of multi-branch programs can be reduced through the branch prediction capability. For some instructions that depend on the previous instruction results, the logic units determine the position of the instructions in the pipeline to implement fast data forwarding. GPUs are good at computing-intensive and easy-to-parallel programs. CPUs are good at logic control and serial computing. CPUs focus on logic control in instruction execution, while GPUs have outstanding advantages in large-scale, intensive, and parallel data computing. Program optimization requires collaboration of CPUs and GPUs.

1.2.8.4 Huawei Ascend 910 AI Processor Neural-network processing unit (NPU): It uses a deep learning instruction set to process a large number of human neurons and synapses simulated at the circuit layer. One instruction is used to process a group of neurons. The NPU is a processor that is specially designed for neural network computing. Its performance is much higher than that of a CPU and GPU in processing neural network tasks. Typical NPUs include Huawei's Ascend AI Processors (Ascend), Cambricon, and IBM's TrueNorth. Huawei provides two types of Ascend AI Processors: Ascend 310 and Ascend 910. Ascend 910 is mainly used in training scenarios and is mostly deployed in data centers. Ascend 310 is mainly used in inference scenarios, covering all device-edge-cloud deployment scenarios. Ascend 910 is the world's most powerful AI processor with the fastest training speed. Its computing power is twice that of the world's top AI processor, equivalent to 50 latest strongest CPUs. Table 1-1 lists the parameters related to Ascend 310 and Ascend 910.

Table 1-1 Parameters related to Ascend 310 and Ascend 910 Ascend 310 Ascend-Mini Architecture: Da Vinci Half precision (FP16): 8 TFLOPS

01 AI Overview (Textbook)

Ascend 910 Ascend-Max Architecture: Da Vinci Half precision (FP16): 256 TFLOPS

21

Page 19

AI Overview

Ascend 310

Ascend 910

Integer precision (INT8): 16 TOPS

Integer precision (INT8): 512 TOPS

16-channel full HD video decoder: H.264/265

128-channel full HD video decoder: H.264/265

1-channel full-HD video encoder: H.264/265

Maximum power consumption: 350 W 7 nm

Maximum power consumption: 8 W 12 nm FFC

1.2.9 AI Industry Ecosystem In the past 50 years, we have experienced three AI upsurges, which were represented by man-machine games. The first one occurred in 1962 when the checkers program developed by Arthur Samuel from IBM beat the best checkers player in the United States. The second one occurred in 1997 when IBM Deep Blue beat Gary Kasparov, the world champion of chess, at 3.5:2.5. The third one broke out in 2016 when AlphaGo, a robot developed by Google DeepMind defeated the Go world champion Lee Sedol who is a player of 9 dan rank in South Korea. In the future, AI will penetrate into various industries, including automobile, finance, consumer goods and retail, healthcare, education, manufacturing, communications, energy, tourism, culture and entertainment, transportation, logistics, real estate, and environmental protection. For example, autonomous driving is a big stage for AI technologies to implement their capabilities. AI can assist in driving and decision-making. In this way, emergencies can be handled by person, simple operations can be automatically processed by the system, and some operations can be semi-automatically processed until the highest level of fully automated driving is achieved. It can greatly reduce fatigue driving and improve driving safety. Intelligent driving is a huge market. It can well feed back researches on intelligent technologies in this field and form a healthy cycle. It is the high-quality foundation for developing AI technologies. A large amount of data is accumulated in the financial sector. AI can implement intelligent asset management, intelligent investment, and more reasonable financial decision-making. AI can also solve the problem of financial fraud, anti-fraud, anti-money laundering, and how to infer the reliability of transactions from various clues, determine the flow of funds and the periodicity of occurrence. In the medical field, AI can also be widely used. For example, AI can be used to accurately interpret images at the geometric level and perform a large amount of data training to determine the problems reflected by image features, providing effective assistance for doctors. Training can be done on classification jobs such as the distinguishment between normal cells and cancer cells. According to statistics from Chinese Association for Artificial Smart (CAAI) and other organizations, the market scale of AI is expected to exceed USD3 trillion by 2025, as shown in Figure 1-7.

01 AI Overview (Textbook)

22

Page 20

AI Overview

Estimated AI Market Scale (Unit: USD1 billion) 3500 3,061 3000 2500

2,147

2000 1,507

1500 1,057 1000 500

741 126

180

256

365

520

0 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 Figure 1-7 Estimated AI market scale We can see that the AI applications have huge market potential. As mentioned in the previous section, AI has three cornerstones: data, algorithm, and computing power. However, it is not enough to implement AI only with these three elements. Application scenarios must be added to the three elements. Data, algorithms, and computing power describe the development of AI from the technical perspective. However, if there is no actual application scenario, the technological breakthrough is only a digital change. To meet the preceding application conditions, AI must be combined with cloud computing, big data, and Internet of Things (IoT) to form the platform architecture of AI applications, as shown in Figure 1-8.

01 AI Overview (Textbook)

23

AI Overview

Page 21

Figure 1-8 Architecture of the AI application platform We need to combine AI with cloud computing, big data, and the IoT to build an intelligent society. The intelligent infrastructure provides computing capability support for the AI industry, including intelligent sensors, intelligent chips, and distributed computing frameworks. It is an important guarantee for the development of the AI industry. The intelligent technology service focuses on how to build an AI technology platform and provide AI-related services externally. These vendors are in a critical position in the AI industry chain. They provide key technology platforms, solutions, and services for various AI applications based on infrastructures and a large amount of data. With the acceleration of building a manufacturing power, a network power, and digital China, the demands for AI technologies and products in manufacturing, home appliance, finance, education, transportation, security protection, healthcare, and logistics will be further released. The types and forms of related intelligent products will become more and more diverse. Only the combination of infrastructure, basic elements, and specific technologies can effectively support upper-layer applications in the AI industry ecosystem. Although AI can be widely applied, it is faced with huge challenges: AI capability development cannot meet excessive market demands. Major problems faced by AI capability development and application include: 

The prerequisites and skill requirements for AI: machine learning and deep learning knowledge, statistics knowledge, linear algebra and calculus knowledge.



Low efficiency and long period of model training: Data collection and data cleaning, model training and optimization, and visualized experience improvement are required.



Fragmented capabilities and experience: Data collection, data cleaning, model training and optimization, and experience improvement need to be performed again in each scenario. Capabilities cannot be directly inherited.



Difficult to improve and enhance capabilities: It is difficult to upgrade models and obtain valid data.

01 AI Overview (Textbook)

24

AI Overview

Page 22

At present, there is consensus in the industry that on-device AI with mobile phones as the core is the trend. More mobile phones will have built-in AI capabilities. Some consulting companies in the U.K. and the U.S. predict that 80% of mobile phones will have built-in AI capabilities by 2022 or 2023. Based on the market prospect and challenges, Huawei launched the AI capability open platform for smart devices, that is, the HiAI open platform. The purpose of HiAI is "Make it Easy for Developers: AI Connection Creates Infinite Possibilities". This platform enables developers to quickly utilize Huawei's powerful AI processing capabilities to provide better smart application experience for users.

1.2.10 HUAWEI CLOUD EI Application Platform 1.2.10.1 Overview of HUAWEI CLOUD EI HUAWEI CLOUD Enterprise Smart (EI) is a driving for enterprises' intelligent transformation. Relying on AI and big data technologies, HUAWEI CLOUD EI provides an open, trustworthy, and intelligent platform through cloud services (in mode such as public cloud or dedicated cloud). It allows enterprise application systems to understand and analyze images, videos, languages, and texts to satisfy the requirements of different scenarios, so that more enterprises can use AI and big data services conveniently, accelerating business development and contributing to society progress.

1.2.10.2 Features of HUAWEI CLOUD EI HUAWEI CLOUD EI has four outstanding features: industry smart, industry data, algorithm, and computing power. 

Industry smart: It has a deep understanding of the industry such as the pain points of the industry, and uses AI technologies to resolve industry pain points and drives AI implementation.



Industry data: The industry never lacks data, so enterprises can use their own data to create a large amount of value through data processing and data mining.



Algorithm: HUAWEI CLOUD provides enterprises with various algorithm libraries, model libraries, general AI services, and a one-stop development platform to solve problems.



Computing power: With 30 years of experience in ICT technologies and a full-stack AI development platform, Huawei can provide enterprises with the strongest and most economical AI computing power.

01 AI Overview (Textbook)

25

AI Overview

Page 23

1.2.10.3 Development History of HUAWEI CLOUD EI Figure 1-9 shows the development history of HUAWEI CLOUD EI.

Figure 1-9 HUAWEI CLOUD EI development history The following details these operations: 1.

In 2002, Huawei started to develop data governance and analysis products for traditional Business smart (BI) services in the telecom field.

2.

In 2007, Huawei started the Hadoop technology research, deployed big data technologies, and reserved a large number of talents and technical patents.

3.

In 2011, Huawei applied big data technologies to telecom big data solutions for network diagnosis and analysis, network planning, and network optimization.

4.

In 2013, large enterprises such as China Merchants Bank and Industrial and Commercial Bank of China started to communicate with Huawei about big data demands and started technical cooperation. In September of the same year, Huawei released FusionInsight, the enterprise-oriented big data analysis platform, at Huawei Cloud Congress (HCC), which has been widely used in various industries.

5.

In 2012, Huawei officially put large-scale investment into the AI industry in and gradually started productization in 2014. In 2015, Huawei started to put AI into internal practice in finance, supply chain, engineering acceptance, e-commerce, and other products at the end of 2015, having achieved the following results: (1) Receipt operational cost rate (OCR) for customs declaration: import efficiency improved by 10 times. (2) Pickup route planning: exceptional expenses reduced by 30%. (3) Intelligent review: efficiency improved by six times. (4) Intelligent recommendations for e-commerce users: application conversion rate increased by 71%.

6.

In 2017, Huawei officially started to provide cloud services and worked with more partners to provide more AI functions.

01 AI Overview (Textbook)

26

AI Overview

Page 24

7.

In 2019, HUAWEI CLOUD EI was dedicated to inclusive AI, making AI affordable, effective, and reliable. Based on the Huawei-developed Ascend chips, HUAWEI CLOUD EI provides 59 cloud services (21 platform services, 22 visual services, 12 language services, and 4 decision-making services), and 159 functions (52 platform functions, 99 application platform interface (API) functions, and 8 pre-integration solutions).

8.

Huawei has invested thousands of R&D personnel in technical R&D (on productization technologies as well as cutting-edge technologies such as analysis algorithms, machine learning algorithms, and natural language processing), and actively contributed the R&D achievements to the communities.

1.3 Technical Fields and Application Fields of AI 1.3.1 AI Technology Direction

Figure 1-10 AI technology direction

01 AI Overview (Textbook)

27

AI Overview

Page 25

Figure 1-10 shows the development trend of AI technologies. At present, application directions of AI technologies are classified into three types:

1.3.1.1 Computer Vision Computer vision is to study how to make computers "see". Among the three AI technologies, computer vision is the most mature one, including image classification and segmentation, object detection, text recognition, and facial recognition. As shown in Figure 1-11 to Figure 1-14, the application of computer vision mainly focuses on electronic attendance, identity authentication, and image search. In the future, computer vision is expected to enter the advanced stage of autonomous understanding, analysis, decision-making, and enabling machines to "see". In scenarios such as autonomous driving and smart home, more value can be created.

Figure 1-11 Electronic attendance

Figure 1-12 Enable identity authentication

01 AI Overview (Textbook)

28

AI Overview

Page 26

Figure 1-13 Image recognition

Figure 1-14 Image search

1.3.1.2 Speech Processing Speech processing is a general term for various processing technologies, including the voice processing, statistical features of speech signals, speech recognition, machine-based voice synthesis, and speech perception. The main topics of voice processing research include voice recognition, voice synthesis, voice wakeup, voiceprint recognition, and audio-based incident detection. Among them, the most mature technology is speech recognition. The near field recognition in a quiet indoor environment can deliver accuracy up to 96%. As shown in Figure 1-15 and Figure 1-16, speech recognition technologies mainly focus on aspects such as speech Q&A and intelligent navigation at present.

01 AI Overview (Textbook)

29

AI Overview

Page 27

Figure 1-15 Question-Answering Bot (QABot)

Figure 1-16 Voice navigation

1.3.1.3 NLP NLP is a discipline that uses computer technology to understand and use natural languages. It studies topics such as machine translation, text mining, and sentiment analysis. NLP imposes high requirements on technologies but confronts low technology maturity. Due to the highly complex semantics, it is difficult for the deep learning based on big data and parallel computing to think and understand things as humans. At present, NLP can only understand shallow semantics, but it will be able to automatically extract features and understand deep semantics in the future, that is, from singlepurpose smart (machine learning) to hybrid smart (machine learning, deep learning, and reinforcement learning). As shown in Figure 1-17 to Figure 1-19, the NLP technology is

01 AI Overview (Textbook)

30

AI Overview

Page 28

widely used in fields now, such as public opinion analysis, comment analysis, and machine translation.

Figure 1-17 Public opinion analysis

Figure 1-18 Comment analysis

01 AI Overview (Textbook)

31

AI Overview

Page 29

Figure 1-19 Machine translation

1.3.2 AI Application Field 1.3.2.1 Intelligent Healthcare

Figure 1-20 Smart healthcare As shown in Figure 1-20, with AI technologies, we can enable AI to "learn" professional medical knowledge, "remember" numerous historical medical cases, and identify medical images with computer vision technologies to provide reliable and efficient assistance for doctors. For example, in the medical imaging technology that has been widely used today, researchers can build models based on historical data to identify existing medical images, quickly identify patients' lesions, and improve diagnosis efficiency.

01 AI Overview (Textbook)

32

AI Overview

Page 30

1.3.2.2 Intelligent Security security is considered a field ideal for AI implementation, and the AI application in this field is more mature than that in others. The field generates massive images and videos, laying a solid foundation for the training of AI algorithms and models. At present, AI technologies are mainly applied to the civil use and police use in the public safety field. Civil use: card swipe based on facial recognition, warning against potential danger, and alert deployment at home Police use: suspect identification, vehicle analysis, suspect search and comparison, and access control at key places

1.3.2.3 Smart Home Based on IoT technologies, a smart home ecosystem consists of hardware, software, and cloud platforms, providing users with personalized life services that create a more convenient, comfortable, and secure home. It uses voice processing to control smart home products, such as air conditioning temperature adjustment, curtain switch control, and voice control on the lighting system. It leverages computer vision technologies to implement home security protection, such as facial or fingerprint recognition for unlocking, real-time intelligent camera, and intrusion detection. Based on historical records of smart speakers and smart TVs, it adopts machine learning and deep learning technologies for user profiling and content recommendation.

1.3.2.4 Smart City

Figure 1-21 Smart city As shown in Figure 1-21, smart city uses information and communication technology to sense, analyze, and integrate key information of the core operating system, to further make intelligent responses to various needs in livelihood, environmental protection, public safety, urban services, and industrial and commercial activities. Substantially, advanced information technologies are used to implement smart city management and

01 AI Overview (Textbook)

33

AI Overview

Page 31

operation, create a better life for person in the cities, and promote the harmonious and sustainable city development. In the smart city scenario, AI is mainly applied to smart environment, smart economy, smart life, smart information, smart logistics, and smart government. For example, it transportation and logistics, and uses facial recognition for safety protection.

1.3.2.5 Retail AI will completely transform the retail industry. A typical case is the fully unmanned supermarket. For example, Amazon Go, unmanned supermarket of Amazon, uses sensors, cameras, computer vision, and deep learning algorithms to completely cancel the checkout process, allowing customers to pick up goods and "just walk out". One of the biggest challenges to unmanned supermarkets is how to charge customers correctly. So far, Amazon Go is the only successful business case and even this case involves many controlled factors. For example, only Prime members can enter Amazon Go. Other enterprises that intend to follow the example of Amazon have to build their membership system first.

1.3.2.6 Autonomous Driving The Society of Automotive Engineers (SAE) in the U.S. defines 6 levels of driving automation ranging from 0 (fully manual) to 5 (fully autonomous). L0 indicates that the driving of a vehicle completely depends on the driver's operation. The system above L3 can implement the driver's hand-off operation in specific cases, and L5 depends on the system in all scenarios. Now only some commercial passenger vehicle models, such as Audi A8, Tesla, and Cadillac, support L2 and L3 advanced driver-assistance systems (ADAS). It is estimated that by 2020, more L3 vehicle models will emerge with the further improvement of sensors and vehicle-mounted processors. L4 and L5 autonomous driving is expected to be first implemented on commercial vehicles in a closed campus. The popularization of advanced autonomous driving requires refined technologies, policies, and infrastructure. It is also estimated that L4 and L5 autonomous driving will not be supported on common roads until 2025 to 2030.

01 AI Overview (Textbook)

34

AI Overview

Page 32

1.3.3 Phases of AI

Figure 1-22 Three phases of AI Figure 1-22 shows the three phases of AI. At present, AI is still in the initial phase of perceptive smart.

1.4 Huawei's AI Strategy 1.4.1 Huawei's Full-Stack, All-Scenario AI Portfolio Huawei announced that it will open source the server OS on December 31, 2020, the standalone GaussDB OLTP database in June 2020, and the MindSpore all-scenario AI computing framework in the first quarter of 2020. "Full-Stack" refers to its technical function. Huawei's full-stack portfolio includes chips, chip enablement, a training and inference framework, and application enablement. "All-Scenario" refers to different deployment scenarios for AI, including public clouds, private clouds, edge computing in all forms, industrial IoT devices, and consumer devices. As the cornerstone of Huawei full-stack AI solution, Atlas provides modules, cards, and servers based on the Ascend AI Processor to meet customers' computing requirements in all scenarios.

01 AI Overview (Textbook)

35

AI Overview

Page 33

1.4.2 Huawei AI Full-Stack Direction 1.4.2.1 HUAWEI CLOUD One-Stop AI Development Platform — ModelArts ModelArts is a one-stop development platform for AI developers. With data preprocessing, semi-automated data labeling, distributed training, automated model building, and model deployment on the device, edge, and cloud, ModelArts helps AI developers build models quickly and manage the lifecycle of AI development. It has the following features: 1.

Automatic learning: It can automate model design, parameter adjustment, and model training, compression, and deployment with the labeled data. The process is code-free and requires no model development experience. ModelArts Pro is a professional development suite for enterprise-class AI applications. Based on the advanced algorithms and fast training capabilities of HUAWEI CLOUD, it provides workflows and models are provided to improve the development efficiency of enterprise AI applications and reduce the development difficulty. Customers can manage workflows to quickly develop, share, and release applications, build an open ecosystem, and implement AI in inclusive industries. ModelArts Pro suites include the NLP suite, text recognition suite, and vision suite, which can quickly respond to AI implementation requirements in different industries and scenarios.

2.

Device-Edge-Cloud: It indicates devices, Huawei intelligent edge devices, and HUAWEI CLOUD, respectively.

3.

Online inference: It is a web service that synchronously provides the inference result for each inference request.

4.

Batch inference: It is a job that processes batch data for inference.

5.

Ascend chips: a series of Huawei-designed AI chips with high computing performance but low power consumption.

6.

The built-in AI data framework combines automatic pre-labeling and hard example labeling to improve the data preparation efficiency by over 100 folds.

7.

The Huawei-developed high-performance distributed framework MoXing uses core technologies such as hybrid parallel cascade, gradient compression, and convolution acceleration, greatly reducing the model training time.

8.

Models can be deployed on devices, edges, and clouds in different scenarios with one click to meet the requirements of high concurrency and lightweight deployment.

9.

ModelArts allows visualized management of the AI development lifecycle, including data preparation, training, modeling, and inference. It also supports resumed training, training result comparison, and model.

10. The AI market supports data and model sharing, helping enterprises improve AI development efficiency and allowing developers to convert knowledge to value.

01 AI Overview (Textbook)

36

AI Overview

Page 34

1.4.2.2 MindSpore In the intelligent era, AI applications in device-edge-cloud scenarios are booming. However, AI still faces huge challenges. Technical barriers, high development cost, and long deployment period hinder the development of the AI developer ecosystem in the entire industry. The all-scenario AI computing framework MindSpore is developed based on the principles of friendly development, efficient operation, and flexible deployment. In terms of deep learning framework, Huawei MindSpore is the strongest challenger to TensorFlow (Google), MXNet (Amazon), PyTorch (Facebook), and CNTK (Microsoft), which are listed as the four major players. MindSpore has been open-sourced on March 30, 2020. It is a product that competes with frameworks such as TensorFlow (Google), PyTorch (Facebook), PaddlePaddle (Baidu), and Caffe. MindSpore provides automatic parallel capabilities. With MindSpore, senior algorithm engineers and data scientists who focus on data modeling and problem solving can run algorithms on dozens or even thousands of AI computing nodes with only a few lines of description. The MindSpore framework supports both large-scale and small-scale deployment, adapting to independent deployment in all scenarios. In addition to the Ascend AI Processors, MindSpore also supports other processors such as GPUs and CPUs.

1.4.2.3 CANN CANN is a chip enabling layer developed by Huawei for DNNs and Ascend AI Processors. It consists of four functional modules: 

FusionEngine: FusionEngine is an operator-level fusion engine. It fuses operators, reduces the memory transfer between operators, and improves the performance by 50%.



CCE operator library: The optimized general operator library provided by Huawei can meet the requirements of most mainstream vision and NLP neural networks. (It is estimated that APIs of the CCE operator library will be released in the first quarter of 2020.) Requirements for timeliness, privacy and research of the customers and partners will lead to the requirements for custom operator. In this case, the third functional module is used.



Tensor Boost Engine (TBE) is an efficient and high-performance custom operator development tool. It abstracts hardware resources as APIs, enabling customers to quickly construct required operators. (This function module is expected to be available in the fourth quarter of 2020.)



The last module is the bottom-layer compiler that optimizes performance and supports Ascend IA Processors in all scenarios.

1.4.2.4 Ascend AI Processor Demands for AI are soaring worldwide. However, with the market being dominated by only a few vendors, AI processors are sold at a very high price. The delivery cycle is long and the local service support is weak. Therefore, the AI requirements of many industries cannot be effectively met.

01 AI Overview (Textbook)

37

AI Overview

Page 35

At HUAWEI CONNECT held in October 2018, Huawei unveiled its Ascend 310 processor for AI inference and Ascend 910 processor for AI training. Built upon the unique Da Vinci 3D Cube architecture, Huawei's Ascend AI Processors boast high computing power, energy efficiency, and scalability. Ascend 310, an AI SoC with ultimate performance per watt, is designed for edge inference. It provides up to 16 TOPS of computing power, with a power consumption of only 8 watts. This makes it a perfect choice for edge computing. The Ascend 910 AI processor delivers the industry's highest computing density on a single AI chip. It applies to AI training and delivers 512 TOPS of computing power, with a maximum power consumption of 310 watts.

1.4.2.5 Atlas AI Computing Platform

Figure 1-23 Atlas AI computing platform portfolio As shown in Figure 1-23, powered by the Ascend AI Processors, the Huawei Atlas AI computing platform supports rich form factors, including modules, cards, edge stations, servers, and clusters. Atlas enables AI solutions for all scenarios across the device, edge, and cloud. As an important part of Huawei's full-stack AI solution, Atlas launches the training platform this year following the inference platform unveiled last year, providing the industry with a complete AI solution. Huawei will also enhance all-scenario deployment, and drive full collaboration across the device, edge, and cloud, enabling every phase of the AI industry chain.

1.5 AI Disputes 1.5.1 Algorithmic Bias Algorithmic biases are mainly caused by data biases. When we use AI algorithms for decision-making, the algorithms may learn to discriminate an individual based on existing data, such as making discriminatory decisions based on race, gender or other factors. Even if factors such as race or gender are

01 AI Overview (Textbook)

38

AI Overview

Page 36

excluded from the data, the algorithms can make discriminatory decisions based on information of names and addresses. For example, if we search with a name that sounds like an African American, an advertisement for a tool used to search criminal records may be displayed. The advertisement, however, is not likely to be displayed in other cases. Online advertisers tend to display advertisements of lower-priced goods to female users. Google's image software once mistakenly labeled an image of black person as "gorilla".

1.5.2 Privacy Issues The existing AI algorithms are all data-driven. In this case, we need a large amount of data to train models. We enjoy the convenience brought by AI every day while technology companies, such as Facebook, Google, Amazon, and Alibaba, are obtaining an enormous amount of user data, which will reveal various aspects of our lives including politics, and gender. In principle, technology companies can record each click, each page scrolling, time of viewing any content, and browsing history when users access the Internet. Technology companies can know our privacy including where we, where we go, what we have done, education background, consumption capabilities, and personal preferences based on our ride-hailing and consumption records.

1.5.3 Contradiction Between Technology and Ethics With the development of computer vision technologies, reliability of images and videos is decreasing. Fake images can be produced with technologies such as Photoshop (PS) and generative adversarial network (GAN), making it hard to identify whether images are true or not. Taking GAN as an example, Ian Goodfellow, a machine learning researcher, proposed this concept in 2014. The reason why the model is called "generative" is that the output of the model is images rather than prediction values related to the input data. The "adversarial network" is from the model where two sets of neural networks competing with each other, just like cashiers and counterfeiters in the battle of wits. One side tries to deceive the other side into believing that it is the authentic money, while the other side tries to identify the counterfeit money.

1.5.4 AI Development = Rising Unemployment? Looking back, human beings have always been seeking ways to improve efficiency, that is, obtain more with less resources. We used sharp stones to hunt and collect food more efficiently. We used steam engines to reduce the need for horses. Every step in achieving automation will change our life and work. In the AI era, AI will replace jobs that involve little creativity and social interaction, such as couriers, taxi drivers, and soldiers. On the other hand, writers, managers, software engineers, and other highly creative jobs are not easily replaced.

01 AI Overview (Textbook)

39

AI Overview

Page 37

1.6 AI Development Trend 1.6.1 Development Trend of AI Technologies 

Easier-to-use development framework Various AI development frameworks are evolving towards ease-of-use and allfunction, continuously lowering the threshold for AI development.



Algorithms model with better performance In the computer vision field, GAN has been able to generate high-quality images that cannot be identified by human eyes. GAN-related algorithms have been applied to other vision-related jobs, such as semantic segmentation, facial recognition, video synthesis, and unsupervised clustering. In the NLP field, the pre-training model based on the Transformer architecture has made a significant breakthrough. Related models such as BERT, general-purpose technology (GPT), and XLNet are widely used in industrial scenarios. In the reinforcement learning field, AlphaStar of the DeepMind team defeated the top human player in StarCraft II.



Smaller deep learning models A model with better performance usually has a larger quantity of parameters, and a large model has lower running efficiency in industrial applications. More and more model compression technologies are put forward to further compress the model volume, reduce the model parameters, accelerate the inference speed, and meet the requirements of industrial applications while ensuring the model performance.



Computing power with comprehensive device-edge-cloud development The scale of AI chips applied to the cloud, edge devices, and mobile devices keeps increasing, further meeting the computing power demand of AI.



More comprehensive AI basic data services The AI basic data service industry is maturing, and related data labeling platforms and tools are being released.



More secure data sharing

As shown in Figure 1-24, federated learning uses different data sources to train models, further breaking data bottlenecks while ensuring data privacy and security.

01 AI Overview (Textbook)

40

AI Overview

Page 38

Figure 1-24 Federated learning

1.6.2 GIV 2025 — 10 Trends for 2025 

Popularization of Intelligent robots Intelligent robots are machines and even family members. Huawei predicts that by 2025, 14% of the world's households will have smart robots. Smart household robots will play an important role in person's lives.



Popularization of Augmented reality (AR)/virtual reality (VR) The number of enterprises that use VR/AR technology will increase to 10%. The application of VR and other technologies will bring vigor and vitality to industries such as commercial display and audio-visual entertainment.



Wide application of AI Man-machine collaboration: 97% of large enterprises will use AI technologies. It is mainly used in various fields, including voice smart, image recognition, facial recognition, and man-machine interaction.



Popularization of big data applications Frictionless communication: The data utilization of enterprises will increase to 86%. Big Data analytics and processing will save time and improve work efficiency for enterprises.



Weakening of search engines Zero search: 90% of the world's population will have personal smart device assistants. This means that chances of getting through a search portal will be greatly reduced.



Popularization of Internet of Vehicles (IoV) Understand my road: cellular Vehicle-to-Everything (C-V2X) will be embedded in 15% of the global vehicles. Smart automobiles and Internet automobiles will be popularized, making driving more secure and reliable.



Popularization of industrial robots Machines are engaged in "three-high" work: 103 robots will work with every 10,000 manufacturing employees. High-risk, high-precision, and high-intensity work will be assisted or completed independently by industrial robots.



Popularization of cloud technologies and applications In the symbiotic economy, the usage of cloud-based applications will reach 85%. Massive applications and program collaboration will be completed at the cloud.



Popularization of 5G With the acceleration of 5G, 58% of the world's population will enjoy 5G services. In the future, communications will bring a disruptive leap forward, and communication technologies and rates will be greatly improved.



Popularization of digital economy and big data

01 AI Overview (Textbook)

41

AI Overview

Page 39

Global digital governance: 180 ZB of data will be stored globally every year. Digital economy and blockchain technologies will be widely used in the Internet.

1.7 Summary This chapter describes the basic concepts, development history, and application background of AI. After reading this chapter, you can understand that, as a cross-field discipline, the application and development of AI cannot be separated from the support of other disciplines. Its physical implementation depends on large-scale hardware, and its upper-layer applications depend on software design and implementation methods. As a learner, you are required to have a clear understanding of the scope and boundary of AI applications, and make improvements based on that.

1.8 Quiz 1.

There are different interpretations of the concept of AI in different contexts. Please explain what AI is based on your understanding.

2.

AI, machine learning, and deep learning are often mentioned at the same time. What is the relationship between them? What are the commonalities and differences of them?

3.

After reading the description of AI application scenarios, please describe an AI application field and its application scenario in reality based on your life experience.

4.

CANN is a chip enabling layer developed by Huawei for DNNs and Ascend AI Processors. Please describe the four modules of CANN.

5.

Please describe the development directions of AI based on your knowledge and understanding.

01 AI Overview (Textbook)

42

Huawei AI Academy Training Materials

Python Basics

Huawei Technologies Co., Ltd.

02 Python Basics (Textbook)

43

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either expressed or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang Shenzhen 518129

Website:

https://e.huawei.com

02 Python Basics (Textbook)

44

Python Basics

Page 1

Contents 1 Introduction to Python ..................................................................................................................... 4 1.1 Overview ...................................................................................................................................................................................... 4 1.2 Advantages and Disadvantages of Python ..................................................................................................................... 4 1.3 Python Application Fields ...................................................................................................................................................... 4 1.4 Python Environments .............................................................................................................................................................. 5 1.4.1 Installing the Python Interpreter ..................................................................................................................................... 5 1.4.2 IDE .............................................................................................................................................................................................. 5

2 Basic Programming ........................................................................................................................... 6 2.1 Python Basics ............................................................................................................................................................................. 6 2.1.1 Basic Syntax ............................................................................................................................................................................ 6 2.1.2 Basic Python Built-in Function ......................................................................................................................................... 6 2.2 Data Structure in Python ....................................................................................................................................................... 7 2.2.1 Data Structure Classification ............................................................................................................................................ 7 2.2.2 Number..................................................................................................................................................................................... 8 2.2.3 String ......................................................................................................................................................................................... 8 2.2.4 Common Operations on Strings ...................................................................................................................................... 8 2.2.5 String Formatted Output ................................................................................................................................................... 9 2.2.6 List ............................................................................................................................................................................................10 2.2.7 Common Operations on Lists .........................................................................................................................................11 2.2.8 Tuple ........................................................................................................................................................................................11 2.2.9 Dictionary...............................................................................................................................................................................12 2.2.10 Common Operations on Dictionaries ........................................................................................................................12 2.2.11 Set ..........................................................................................................................................................................................13 2.2.12 Common Operations on Sets .......................................................................................................................................13 2.2.13 Deep Copy and Shallow Copy ......................................................................................................................................14 2.2.14 Operator ..............................................................................................................................................................................14 2.3 Control Flow .............................................................................................................................................................................15 2.3.1 Judgment Statement – if..................................................................................................................................................15 2.3.2 Loop Statement – for ........................................................................................................................................................15 2.3.3 Loop Statement – while ...................................................................................................................................................15 2.3.4 Loop Termination – break and continue ....................................................................................................................16 2.4 Functions and Object-oriented Programming .............................................................................................................16 2.4.1 Functions ................................................................................................................................................................................16 2.4.2 Function Definition and Calling .....................................................................................................................................16

02 Python Basics (Textbook)

45

Python Basics

Page 2

2.4.3 Function Return Values ....................................................................................................................................................17 2.4.4 Function Arguments ..........................................................................................................................................................17 2.4.5 Anonymous Functions .......................................................................................................................................................17 2.4.6 Object-oriented and Procedure-oriented Processes ...............................................................................................18 2.4.7 Advantages of Object-oriented Process......................................................................................................................18 2.4.8 Terminologies in Object-oriented Process .................................................................................................................18 2.4.9 Object-oriented Process in Python ...............................................................................................................................18 2.4.10 Privatization of Classes in Python ..............................................................................................................................19 2.4.11 Programming Paradigms ...............................................................................................................................................19 2.5 Standard Libraries ..................................................................................................................................................................19 2.5.1 Python Standard Libraries – sys ....................................................................................................................................19 2.5.2 Python Standard Libraries – os ......................................................................................................................................20 2.5.3 Python Standard Libraries – time .................................................................................................................................20 2.6 I/O Operations .........................................................................................................................................................................20 2.6.1 File Read and Write............................................................................................................................................................20 2.6.2 File Opening Modes ...........................................................................................................................................................21 2.6.3 Common File Handling Functions ................................................................................................................................22 2.6.4 Context Managers ..............................................................................................................................................................22 2.7 Modules and Exceptions ......................................................................................................................................................23 2.7.1 Modules ..................................................................................................................................................................................23 2.7.2 Exceptions ..............................................................................................................................................................................23 2.7.3 Exception Handling ............................................................................................................................................................23

3 Advanced Programming ................................................................................................................ 24 3.1 Database Programming .......................................................................................................................................................24 3.1.1 Database Programming ...................................................................................................................................................24 3.1.2 MySQL Operations .............................................................................................................................................................24 3.2 Multitasking .............................................................................................................................................................................24 3.2.1 Multitasking ..........................................................................................................................................................................24 3.2.2 Thread .....................................................................................................................................................................................25 3.2.3 Thread Synchronization ....................................................................................................................................................25 3.2.4 Process ....................................................................................................................................................................................25 3.3 Magic Methods .......................................................................................................................................................................25 3.4 Higher-Order functions ........................................................................................................................................................26 3.5 Regular Expression .................................................................................................................................................................26 3.5.1 Regular Expression .............................................................................................................................................................26 3.5.2 Regular Expression Execution Process .........................................................................................................................26 3.5.3 Common Matching Methods of the re Module ......................................................................................................27 3.5.4 Common Methods for Match Object Instances .......................................................................................................27 3.5.5 Special Symbols and Characters ....................................................................................................................................28

02 Python Basics (Textbook)

46

Python Basics

Page 3

3.6 Generators, Iterators, and Decorators ............................................................................................................................30 3.6.1 Iterators ..................................................................................................................................................................................30 3.6.2 Generators .............................................................................................................................................................................30 3.6.3 Closures ..................................................................................................................................................................................31 3.6.4 Decorators .............................................................................................................................................................................31 3.7 Extension ...................................................................................................................................................................................31 3.7.1 JSON ........................................................................................................................................................................................31 3.7.2 Metaclasses ...........................................................................................................................................................................32 3.7.3 Garbage Collection Mechanism in Python ................................................................................................................32

4 Quiz ..................................................................................................................................................... 33 4.1 Short Answer Questions ......................................................................................................................................................33 4.2 Multiple-Choice Questions ..................................................................................................................................................33

02 Python Basics (Textbook)

47

Python Basics

1

Page 4

Introduction to Python

Python is one of the most popular programming languages, and is the most widely used programming language in the artificial intelligence (AI) field. Python 2 and Python 3 are mainstream versions. Here, we will learn about Python 3.

1.1 Overview Python is a universal advanced programming language. It is completely open-source. The author of Python is Guido Van Rossum.

1.2 Advantages and Disadvantages of Python Advantages: Python is an advanced object-oriented programming language. It is a dynamic and interpretive language. It has elegant structure and clear syntax, which is easy to learn. It has a huge collection of third-party library, and It can invoke code written in other languages, therefore it is known as "glue language". It also supports functional programming. Disadvantages: Low running speed

1.3 Python Application Fields Python has abundant third-party libraries and advantages of the Python language. Therefore, Python can be used in many fields, such as artificial intelligence, data science, system tool compilation, application development, O&M script automation, and web development.

02 Python Basics (Textbook)

48

Python Basics

Page 5

1.4 Python Environments 1.4.1 Installing the Python Interpreter Download the interpreter file from the official website and install it (each system has its mapping version). After the installation is complete, configure environment variables (Python allows multiple versions to coexist). Install the Anaconda. The Anaconda is a Python interpreter that integrates multiple thirdparty libraries and is widely used in AI and scientific computing. The Anaconda has two versions that apply to Python 2 and Python 3.

1.4.2 IDE PyCharm: a development environment with extremely powerful and convenient functions Eclipse: a development tool widely used in Java and Python Jupyter Notebook: web-based interactive computing environment.

02 Python Basics (Textbook)

49

Python Basics

2

Page 6

Basic Programming

2.1 Python Basics 2.1.1 Basic Syntax Python uses indentation to divide statement blocks. An indentation contains four spaces and can be inserted by pressing the Tab key. Python programs are executed from top to bottom. Packages and modules are imported using the import and from...import... statements. If multiple statements are in one line, use semicolons (;) to separate them. A number sign (#) is used to comment out one line, and a doc string('''... ''',"""…""") is used to comment out multiple lines. PEP8 (not mandatory) PEP8 is a style guide that Python code complies with, not a syntax rule. PEP 8 helps improve code readability and elegance. Keywords: identifiers defined in Python with special functions. Identifier naming rules: An identifier consists of letters, underscores (_), and digits, and cannot start with a digit. User-defined identifiers cannot share the same names as predefined keywords. Variables (reference of data storage addresses): When data is stored in a computer, a corresponding storage address can be obtained. Assigning a value to a variable is not assigning data to the variable, but assigning the storage address of the data to the variable. Scope: scope within which variables can be accessed when a program is running. Local variables: variables defined in a function. The variables can only be used within the function. Global variables: variables defined outside of functions and objects. The variables can be used within the entire module.

2.1.2 Basic Python Built-in Function print(): output function. print("hello world"): generates "hello world". input(): receives user input. del(obj): deletes an object from the memory.

02 Python Basics (Textbook)

50

Python Basics

Page 7

a="python"; del(a) range(start, stop, [step]): generates an iterative sequence (including the start position, end position, and step). range(0,20,4) type(obj): type of the returned object. type(print): generates builtin_function_or_method. dir(obj): views the built-in methods and attributes of an object. dir(print)

id(obj): views the object memory address. A=1;id(A): generates result 1735879424. help(obj): displays the help information about an object. help(print)

2.2 Data Structure in Python 2.2.1 Data Structure Classification Python has the following common data types: number, string, list, tuple, dictionary, and set. These data types not only improve Python running efficiency, but also greatly improve our development efficiency. They also make Python easy to operate. Python data types are classified into the following types: Sequential: Subscripts (indexes) can be used to access elements. You can access elements in slice mode such as [start:stop:step]. Nonsequential: Subscripts (indexes) cannot be used to access elements. Changeable: Values can be modified. Unchangeable: Values cannot be modified.

02 Python Basics (Textbook)

51

Python Basics

Page 8

2.2.2 Number Python 3 supports values of int, float, bool, and complex types. Basic operations of the Number type are as follows: Addition (+) Subtraction (-) Multiplication (x) Division (/) Modulo/Rounding (%, //) Power (**) If the operation is performed on numbers of different types (such as int and float), the result type is the type with higher precision.

2.2.3 String In Python, a string is a sequence with multiple characters. The number of characters indicates the length of the string. Python does not have a character data type. A single character is considered as a string with length 1. To declare a string, you only need to use single quotation marks ('...') or double quotation marks ("...") to enclose the content. You can also use three consecutive quotation marks ('''...''' or """..."""). The escape character (\) and the original string r can be used in the string. Operator: +: Two strings are concatenated. Example: a="hello";b="world" =>a+b= 'helloworld'. *: A new string is obtained by multiplying a string by a number. Example: "a"*2=>"aa"

2.2.4 Common Operations on Strings Table 2-1 Common operations on strings Operation

Definition

Examples

Segmentation

str.split(str1): Splits a string using str1 as a separator.

'python'.split('h')

Replace

str.replace(str1, str2): Replaces str1 in the string with str2 to generate a new character string.

'python'.replace('py','PY')

Uppercase

str.lower(): Converts uppercase letters in a string to lowercase letters.

'PYTHON'.lower()

02 Python Basics (Textbook)

Output: ['pyt', 'on']

Output: PYthon

Output: python

52

Python Basics

Page 9

Operation

Definition

Examples

Lowercase

str.upper(): Converts lowercase letters in a string to uppercase letters.

'python'.upper()

Stitching

str.join(iter): Concatenates each element in the given parameter with the specified character to generate a new string.

Formatted output

Uses the formatting operator (%), string conversion type, and formatting operator auxiliary instructions to implement formatted output.

Output: PYTHON

"-".join("huawei") Output: h-u-a-w-e-i

'My name is %s , age is %d' %('AI', 63) Output: My name is AI, age is 63

2.2.5 String Formatted Output Table 2-2 String format conversion types Format

Description

%c

Character and its ASCII code

%s

String

%d

Signed integer (decimal)

%u

Unsigned integer (decimal)

%o

Unsigned integer (octal)

%x

Unsigned integer (hexadecimal)

%X

Unsigned integer (hexadecimal and uppercase)

%e

Floating point (scientific exponential notation)

%E

Floating point (scientific E-notation)

%f

Floating point (decimal)

02 Python Basics (Textbook)

53

Python Basics

Format

Description

%g

Floating point (same as %e or %f depending on the value)

Page 10

Table 2-3 Auxiliary formatting commands Symbol

Description

*

Defines the width or decimal precision.

-

Used for left alignment.

+

Displays a plus sign (+) before a positive number.



Displays a space before a positive number.

#

Displays a zero (0) before an octal number, and 0x or 0X before a hexadecimal number (depending on whether x or X is used).

0

Pads a zero (0) for numeric values instead of a default space.

(var)

Mapping variable (dictionary parameter)

m.n

m indicates the minimum total width of the display, and n indicates the number of digits after the decimal point.

2.2.6 List A list is a sequence in which elements can be of any data type and elements can be added or deleted at any time. In a list, elements are enclosed by a pair of square brackets and are separated by commas (,). You can create a list in either of the following ways: List = list(obj1, obj2,…) List = [obj1, obj2, ….] List comprehensions Operator: +: Combines lists, for example, the result of [1,2]+[2,3] is [1, 2, 2, 3]. x: Multiplies a list by a number to obtain a new list, for example, the result of [1,2] x 2 is [1, 2, 1, 2].

02 Python Basics (Textbook)

54

Python Basics

Page 11

2.2.7 Common Operations on Lists Table 2-4 Common operations on lists Operation

Definition

Examples

list.append(obj): adds an object in a parameter to the end of the list.

a=[1,2]; a.append(3); a Output: [1,2,3]

list.insert(index, obj): inserts an object to the index position of a list.

a=[1,2];a.insert(0,3);a Output: [3, 1, 2]

list.extend(iter): inserts each element of an iterable object into the tail of a list one by one.

a=[1,2];a.extend([3,4]);a Output: [1, 2, 3, 4]

list.pop([index]): deletes the element from the position of the index parameter and returns the deleted element. If no parameter is passed, the last element is deleted by default.

a=[1,2];b=a.pop(1);a,b Output: [1],2

list.remove(obj): deletes the first given element in the list.

a=[1,2];a.remove(2);a Output: [1]

Search

list.index(obj): returns the index of the first occurrence of a given element.

a=[1,2];a.index(2);a Output: 1

Sort

list.sort(): sorts the list. The default sorting order is ascending.

a=[3,1,2];a.sort();a Output: [1,2,3]

Reverse

list.reverse(): reverses the elements in the list (by directly modifying the list itself).

a=[3,1,2];a.reverse();a Output: [2,1,3]

Count

list.count(obj): returns the number of occurrences of a given element.

a=[3,1,2,1];a.count(1) Output: 2

Add

Delete

2.2.8 Tuple A tuple is a sequence in which elements can be of any data type. Data stored in tuples is of higher security than that on lists. Elements in a tuple are enclosed by a pair of parentheses and are separated by commas (,). A tuple can be created in the following three ways: Tuple = tuple(obj1, obj2, …)

02 Python Basics (Textbook)

55

Python Basics

Page 12

Tuple = (obj1, obj2, …) Tuple = obj1,obj2,obj3 If a tuple has only one element when it is created, a comma must be added to the end of the element to tell the interpreter that this is not a parenthesis of the operator.

2.2.9 Dictionary Each element of the dictionary consists of a key and value. Therefore, the elements of the dictionary are also called key-value pairs. A key is immutable and unique. If a dictionary has duplicate keys, the value of a later key overwrites the value of the previous key. When there is a large amount of data, the access speed of dictionary data is higher than that of a list block. Elements in a dictionary are enclosed by a pair of braces and are separated by commas (,). Common methods of creating a dictionary are as follows: Dict = {key:value,} Dict = dict(key=value,) Dict = dict([(key,value),]) Dictionary comprehensions

2.2.10 Common Operations on Dictionaries Table 2-5 Common operations on dictionaries: Operation

Definition

Examples

dict.get(key, default=None): obtains the value based on the key. If the key does not exist, the default value is returned.

Dict={'a':1,'b':2}; Dict.get('a')

dict.items(): returns a list of all (key, value) tuples.

Dict={'a':1,'b':2}; Dict.items()

dict.keys(): returns a list of all keys.

Dict={'a':1,'b':2}; Dict.keys()

dict.values(): returns a list of all values.

Dict={'a':1,'b':2}; Dict.items()

dict[key] = value: adds the keyvalue pair {key:value}. If the key already exists, change the value of the existing key.

Dict={'a':1,'b':2}; Dict['a']=3; Dict

acquisition

Add a member drive

02 Python Basics (Textbook)

Output: 1

Output: dict_items([('a', 1), ('b', 2)])

Output: dict_keys(['a', 'b'])

Output: dict_values([1, 2])

Output: {'a':3,'b':2}

56

Python Basics

Page 13

Operation

Definition

Examples Dict={'a':1,'b':2}; Dict2={'a':3,'c':3};

Update

dict.update(dict1): uses dict1 to update the dictionary.

Dict.update(Dict2); Dict Output: {'a': 3, 'b': 2, 'c': 3}

dict.pop(key): deletes and returns the value of the key.

Dict={'a':1,'b':2};a=Dict.pop('a') ; Dict,a Output: ({symptom: 2}, 1)

Delete

Dict.popitem(): deletes and returns a key-value pair randomly.

Dict={'a':1,'b':2};a=Dict.popitem(); Dict,a

dict.clear(): clears the dictionary.

Dict={'a':1,'b':2}; Dict.clear(); Dict

Output: ({'a', 1}, ('b', 2))

Output: {}

2.2.11 Set Every element in a set is unique, and duplicate elements are deleted. Elements in a set are enclosed by braces and are separated by commas (,). You can create a set in the following ways: Set = set() Set = {obj1,obj2,…} Logical operations: Intersection set1 & set2: same elements in the two sets Symmetric difference set1 ^ set2: elements which are in either of the sets and not in their intersection Union set1 | set2: all elements in the two sets with duplicate elements deleted Difference set set1 - set 2: elements contained in set 1 but not contained in set 2.

2.2.12 Common Operations on Sets Table 2-6 Common operations on sets Operation

Definition

Examples

Add a member drive

set.add(obj): adds an element. If the element already exists, no operation is performed.

Set={1,2,3}; Set.add(4); Set

02 Python Basics (Textbook)

Output: {1, 2, 3, 4}

57

Python Basics

Operation

Delete

Page 14

Definition

Examples

set.update(obj): adds an object which can be a list, a dictionary, or others. Multiple objects can be added and need to be separated by commas (,).

Set={1,2};Set.update({2,3});Set

set.remove(obj): removes an element. (If the element to be deleted does not exist, an exception is thrown.)

Set={1,2};Set.remove(1);Set

set.discard(obj): deletes an element. (No exception is thrown if the element does not exist.)

Set={1,2};Set.discard(1);Set

set.clear(): removes all elements from a set.

Set={1,2};Set.clear();Set

set.pop(): removes a random element from a set.

Set={1,2};a=Set.pop();Set,a

Output: {1, 2, 3}

Output: {2}

Output: {2}

Output: set()

Output: ({2}, 1)

2.2.13 Deep Copy and Shallow Copy In Python, data copy can be classified into deep copy and shallow copy. Shallow copy (copy()): Copies the data structure. If the data is in a nested structure, the elements in the nested structure are references to the original data. Modification of the original data affects the copied data. Deep copy: Compared with the structure reference in shallow copy, all data is copied, and modification of the original data does not affect the copied data. To use deep copy, import the copy module in Python and use the deepcopy() method in the module.

2.2.14 Operator Python has the following operators: Arithmetical operator Comparison operator: ==, !=, >, <, >=, <= Assignment operator: =, +=, –=, /=, *=, **=, //= Bitwise operator: &, |, ^ Logical operator: and, or, not Membership operator: in, not in Identity operator: is, is not

02 Python Basics (Textbook)

58

Python Basics

Page 15

2.3 Control Flow 2.3.1 Judgment Statement – if The condition control in Python determines the code block to be executed based on the execution result (True or False) of the conditional statement. In Python, if is used to control program execution. If there are multiple conditions, the if – elif – else format can be used. if condition 1: Statement 1 elif condition 2: Statement 2 else: Statement 3

2.3.2 Loop Statement – for The for statement in Python is different from the ones in other languages. It takes an iterable object (such as a sequence) as its parameter and iterates one element at a time. You can add the else statement block following the for loop and execute the statement block when the loop ends. If a for loop is not terminated by a break statement, the statement in the else block is executed. The for statement is used in the following way: for iter in iters: Loop statement block else: Statement block

2.3.3 Loop Statement – while In Python, the while statement is used to execute a loop program. Under certain conditions, a loop program is executed to process the same tasks repeatedly. When the condition of the while statement is always true, the loop will never end, forming an infinite loop. You can add the else statement block to the end of the while statement to execute the statement when the condition is false. Avoid empty while loops which waste resources. The while statement is used in the following way: while condition statement: Statement block that is executed circularly # Execute the statement block when the condition is true. else:

02 Python Basics (Textbook)

59

Python Basics

Statement block. false.

Page 16

# Execute the statement block when the condition is

2.3.4 Loop Termination – break and continue If you want to interrupt a loop, break and continue can be used. A break statement ends the entire loop. If break is triggered, the current loop ends and the corresponding else statement is not executed. If a break statement is used in a nested loop, the loop at the layer where break is located terminates and the next line of code starts to be executed. A continue statement is used to tell Python to skip the remaining statements of the current loop and continue the next loop. Both break and continue statements can be used in while and for loops.

2.4 Functions and Object-oriented Programming 2.4.1 Functions A function is a code segment that is organized and can be reused to implement a single or associated functions. Functions can improve the modularity of applications and code reusability. Python provides many built-in functions such as print(). You can also customize functions.

2.4.2 Function Definition and Calling Definition: In Python, the keyword def is used to mark the start of a function, followed by the function name and required parameters enclosed in parentheses. Optional documentation string can be used in the first line of the function body to describe the function. The function content starts with a colon (:) and is indented. The return statement indicates the end of a function and is used to return the execution result of a function. You must use parentheses to call functions with required parameters enclosed in the parentheses. def function(param): ’’’Description documentation’’’

# Define functions and required parameters. # Function description.

Function body # Content to be executed by the function. function(param)

02 Python Basics (Textbook)

# Call the function.

60

Python Basics

Page 17

2.4.3 Function Return Values Functions can be classified into functions with return values and those without return values. Without a return value: If the function body does not contain a return statement, the function returns None. With a return value: If the function body contains a return statement, the corresponding expression or value is returned. Python can return multiple values upon a function call. By default, the return value is a tuple.

2.4.4 Function Arguments Function arguments can be classified into the following types: Required arguments: must be passed to a function in correct positional order, and the number of arguments in the function call should match exactly with the function definition. Keyword arguments: During a function call, the equal sign (=) is used to assign values to passed arguments. Default arguments: A default argument is an argument that assumes a default value if a value is not provided in the function call for that argument. Variable-length arguments: You may need to process a function for more arguments than you specified while defining the function. These arguments are called variablelength arguments and are not named in the function definition.* *args and **kwargs: The variable args with the asterisk (*) stores all unnamed variable arguments. The variable args is a tuple. The variable kwargs with the two asterisks (**) stores named arguments, such as key=value. The variable kwargs is a dictionary. Argument positions: def func (Required arguments, Keyword arguments, Default arguments, Variable-length arguments)

2.4.5 Anonymous Functions In addition to def, Python provides lambda to create anonymous functions. Compared with common functions, anonymous functions have the following features: lambda is only an expression, and the function body is much simpler than def. lambda is not a code block. Only limited logic can be encapsulated in the lambda expression. The lambda functions have their own namespace and cannot access variables other than those in their parameter list and those in the global namespace. An anonymous function can be defined as follows: lambda x:x+1

02 Python Basics (Textbook)

61

Python Basics

Page 18

2.4.6 Object-oriented and Procedure-oriented Processes Object-oriented and procedure-oriented processes are commonly used in programming. Object-oriented: An object is basically a self-contained entity that accumulates both data and procedures to manipulate the data. A computer program is considered as a set of objects. Each object can receive and process messages from other objects. The execution of a computer program is to transmit a series of messages between objects. Procedure-oriented: A computer program is considered as a set of commands, that is, a set of functions that are executed in sequence. In order to simplify the program design, the procedure-oriented approach divides functions into sub-functions, that is, the system complexity is reduced by dividing a large block function into smaller blocks.

2.4.7 Advantages of Object-oriented Process Improves code reusability. Makes coding more flexible and improves code maintainability. Improves program scalability. Improves development efficiency.

2.4.8 Terminologies in Object-oriented Process Class: A class, like a blueprint, is a code block used to create an object. It describes object features and attributes, how to use an object to complete tasks, and how an object responds to events. Object: An object is an instance of a class. An object is usually created by calling a constructor in the class. Method: A method is a function defined in a class. Generally, a method describes an operation that an object can perform. Attribute: An attribute is a variable defined in a class. The attributes of a class highlight the property or status of an object. Encapsulation: Encapsulation is to integrate methods, attributes, and events into a unified class and shields the details of the class from users. Inheritance: Inheritance is a method to create a class. Based on the existing class (inherited class), a new class is derived, which is called a child class. The inherited class is called parent class, base class, or superclass. Polymorphism: A function may have different implementations for different objects.

2.4.9 Object-oriented Process in Python Python is an object-oriented programming language with built-in object-oriented features. In Python, everything is an object. Python supports multiple inheritance, where a class can have multiple parent classes. You can create a class as follows:

02 Python Basics (Textbook)

62

Python Basics

Page 19

class class name (parent class): # class keyword to declare the class. Multiple parent classes are supported. By default, the classes inherit from the object class. ’’’Class description document’’’ ...Class body...

2.4.10 Privatization of Classes in Python By default, the attributes in Python are public, and the module where the class resides and other modules that have imported the class can be accessed. If you want to restrict the access to or the inheritance from some attributes in a class, you can make them private. Making modules private: You can prefix an underscore to an attribute or method. You need to prevent the module attributes from being loaded using from mymodule import *. The module attributes can be used only in this module. Full privatization: Only you can access the attributes. Prefix double underscores to a method or attribute. (Python does not have mechanism for full privatization. In this process, the attribute or method name is changed to _class name.__attribute/method.)

2.4.11 Programming Paradigms Common programming paradigms used in Python are process-oriented programming, object-oriented programming, and functional programming. Functional programming: An operation process is abstracted as a series of nested function invokings, which have the following features: All functions return a new value without depending on or changing external data. A function can assign a value to a variable as a parameter or return value of another function. Functional programming is flexible, and code can be written very close to natural language. A function written in a pure functional programming language has no variable. Therefore, any function whose input is definite and output is definite is called a pure function without side effects.

2.5 Standard Libraries 2.5.1 Python Standard Libraries – sys The sys module is responsible for the interaction between the program and Python interpreter, and provides a series of functions and variables to control the Python running environment. Common attributes and methods: sys.argv: returns command-line arguments. sys.exit(): allows the developer to exit from Python. sys.path(): returns the search paths of Python modules.

02 Python Basics (Textbook)

63

Python Basics

Page 20

sys.platform: returns the system running platform. sys.stdin/stdout/stderr: standard input/standard output/errors

2.5.2 Python Standard Libraries – os The os module is responsible for the interaction between the program and operating system, and provides interfaces for accessing the operating system. Common methods and attributes: os.path.basename(): returns the base name in a specified path. os.path.dirname(): returns the directory name in a specified path. os.environ: contains the mapping information of environment variables. For example, os.environ["HOMEPATH"] obtains the value of environment variable HOMEPATH. os.chdir(dir): changes the current working directory. For example, os.chdir("d:\\outlook") changes the current working directory to d:\\outlook. os.getcwd(): returns the current directory.

2.5.3 Python Standard Libraries – time The time module is an important module in Python for processing time. It contains various time-related methods. Common methods: time.sleep(secs): halts execution of a thread for given time. time.strftime(format[, t]): converts struct_time (indicating the current time by default) to a string in a format specified by the format parameter. time.time(): returns the timestamp of the current time. time.localtime([secs]): converts a timestamp to struct_time of the current time zone.

2.6 I/O Operations 2.6.1 File Read and Write Python has built-in functions for reading and writing files. The open() function returns a file object. Generally, parameters filename, mode, and encoding are required. filename: name of the file to be opened. mode: file opening mode. encoding: encoding format of the file to be opened. The default value is utf8. Example: f = open("file_name","r",encoding="utf8") # Open the file whose name is file_name in read-only mode. The encoding format is utf8. Use the f.close() function to close the file after the operation is complete.

02 Python Basics (Textbook)

64

Python Basics

Page 21

2.6.2 File Opening Modes Table 2-7 File opening modes Access Mode

Description

r

Open a file in read-only mode. The pointer will be placed at the beginning of the file. This is the default mode.

w

Open a file in write-only mode. If the file already exists, the content in the file will be overwritten. If the file does not exist, a new file will be created.

a

Open a file for appending new content to it. If the file already exists, the file pointer is placed at the end of the file. New content will be written after the existing content. If the file does not exist, a new file will be created and content will be written to it.

rb

Open a file for reading in binary mode. The pointer will be placed at the beginning of the file. This is the default mode.

wb

Open a file for writing in binary mode. If the file already exists, the content in the file will be overwritten. If the file does not exist, a new file will be created.

ab

Open a file for appending in binary mode. If the file already exists, the file pointer is placed at the end of the file. New content will be written after the existing content. If the file does not exist, a new file will be created and content will be written to it.

r+

Open a file for reading and writing. The pointer will be placed at the beginning of the file.

w+

Open a file for reading and writing. If the file already exists, the content in the file will be overwritten. If the file does not exist, a new file will be created.

a+

Open a file for reading and writing. If the file already exists, the file pointer is placed at the end of the file. Content will be appended to the file. If the file does not exist, a new file will be created for reading and writing.

rb+

Open a file for reading and writing in binary mode. The pointer will be placed at the beginning of the file.

wb+

Open a file for reading and writing in binary mode. If the file already

02 Python Basics (Textbook)

65

Python Basics

Access Mode

Page 22

Description exists, the content in the file will be overwritten. If the file does not exist, a new file will be created.

ab+

Open a file for appending in binary mode. If the file already exists, the file pointer is placed at the end of the file. If the file does not exist, a new file will be created for reading and writing.

2.6.3 Common File Handling Functions f.write(str): writes contents of string to the file. f.read([size]): reads data. size indicates the number of bytes to be read. If size is not specified, all bytes are read. f.readline(): reads a line from a file and returns an empty string if the end of the file is reached; f.readlines(): reads all lines of a file and returns a list, on which each element indicates a line of data and \n is included. f.tell(): returns the current position of the file read pointer. f.seek(off, where): moves the file read or write pointer to a specific position. off indicates the offset. A positive offset will move the pointer forwards and a negative offset will move the pointer backwards. where indicates where to start from. Value 0 indicates the beginning of the file, value 1 indicates the current position, and value 2 indicates the end of the file. f.flush(): updates the cache. f.close(): closes the file.

2.6.4 Context Managers A context manager is used to execute the preprocessing and cleanup operations as a pair, with a block of code in between. The __enter__ and __exit__ methods of a context manager execute the preprocessing and cleanup operations, respectively. Open a file using a context manager. with open(file_name, mode,encoding) as f: # File handling statement. A file will be automatically closed after it is handled by using a context manager.

02 Python Basics (Textbook)

66

Python Basics

Page 23

2.7 Modules and Exceptions 2.7.1 Modules As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function in several programs without copying its definition into each program. To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module. A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module name (a string) can be obtained based on the value of the global variable __name__.

2.7.2 Exceptions In most cases, there are two distinguishable kinds of errors: syntax errors and exceptions. Syntax errors: Errors occur when you write the code and before you execute the code. Exceptions: Exceptions occur when you attempt to execute the code. For example, an exception occurs when a divisor is 0. Common Python exceptions are as follows: ZeroDivisionError: division or modulo by zero (all data types) OSError: operating system error SyntaxError: syntax error IndentationError: indentation error StopIteration: end of iteration

2.7.3 Exception Handling The try and except keywords are used for exception handling. try: a=1/0 except Exception as e: print("The exception is captured.") User-defined exception: You can define an exception class that is inherited from the Error or Exception class. You can use the raise statement to raise the exception.

02 Python Basics (Textbook)

67

Python Basics

3

Page 24

Advanced Programming

3.1 Database Programming 3.1.1 Database Programming Database: A warehouse that organizes, stores, and manages data based on data structures. Users can add, delete, query, and modify data in a database. Database technologies are widely used and the development of various database applications has become an important aspect of computer science. The Python standard for database interfaces is the Python DB-API, which is used by Python's database interfaces. The Python DB-API supports various databases, such as MySQL, PostgreSQL, and Oracle.

3.1.2 MySQL Operations Import the required database: import pymysql Enable the database connection: db = pymysql.connect("localhost", "root", "mysql", "my_database", charset='utf8') In the preceding statement, localhost indicates a local connection, and you can change it to the IP address of the database; root and mysql indicate the account name and password; my_database indicates the name of the connected database; charset='utf8' indicates that the data encoding format is UTF-8. Obtain the operation cursor: cursor = db.cursor() Execute SQL statements: cursor.execute(sql) Disable the database connection: db.close()

3.2 Multitasking 3.2.1 Multitasking The operating system can run multiple programs at the same time. Execution modes of multiple tasks: 

Concurrency: Multiple tasks are executed alternately.



Parallel: Multiple tasks are executed at the same time.

Implementation of multiple tasks:

02 Python Basics (Textbook)

68

Python Basics



Thread



Process



Coroutine

Page 25

3.2.2 Thread A thread is the minimum execution unit of the operating system. All threads in a process share global variables. Because of the global interpreter lock (GIL), multithreading cannot be implemented in Python. GIL is a lock used for data integrity protection and status synchronization among threads.

3.2.3 Thread Synchronization If multiple threads perform operations on the same global variable at the same time, resource contention occurs, causing data errors. To solve this problem, thread synchronization is required. Thread synchronization indicates that multiple threads are executed in sequence. To implement thread synchronization, a lock mechanism needs to be introduced. Mutex lock: When a thread is executed, a mutex lock is added to the resource so that other threads cannot operate the resource. Only after the thread releases and unlocks the resource, other threads can operate the resource. Deadlock: A deadlock occurs when a process enters a waiting state because a requested resource is held by another waiting process, which in turn is waiting for another resource held by another waiting process.

3.2.4 Process A process is the minimum resource allocation unit of the operating system. A program has at least one process, and a process has at least one thread. Data is shared among threads in the same process. Each process has independent memory space.

3.3 Magic Methods There are magic methods in Python that can be used to enhance your class functionality, which start and end with two underscores (_). Common magic methods: __init__: Defines the initial attributes of an object when an object is initialized. __str__: Returns the string representation of an object. The goal is to be readable. __repr__: Returns the string representation of an object. The goal is to be unambiguous. __getattr__: Obtains an attribute. It is invoked only when the attribute is not found. __setattr__: Sets an attribute.

02 Python Basics (Textbook)

69

Python Basics

Page 26

__iter__: Creates an iterator.

3.4 Higher-Order functions zip([iterable1, iterable2 ...]): Aggregates elements from each of the given iterable objects into a tuple and returns a list of tuples. (If the iterable objects are of different lengths, the returned list is as long as the smallest object.) print(*zip([1,2,3],["a","b","c"],["A","B","C"])) Output: (1, 'a', 'A') (2, 'b', 'B') (3, 'c', 'C') map(function, iterable, ...): Applies the given function to each item in a sequence. print(*map(lambda x:x*x, [1,2,3])) Output: 1 4 9 filter(function, iterable): Filters the given iterable object based on the given function. print(*filter(lambda x: x%2==1, [1,2,3])) Output: 1 3 sorted(iterable[, cmp[, key[, reverse]]]): Sorts iterable objects. (You can specify the element key and function cmp for sorting. You can also specify the sorting order: reverse = True indicates the descending order and reverse = False indicating the ascending order. The ascending order is used by default.) sorted([('b',2),('a',1),('c',3),('d',4)], key=lambda x:x[1]) Output: [('a', 1), ('b', 2), ('c', 3), ('d', 4)]

3.5 Regular Expression 3.5.1 Regular Expression Regular expressions are an important part of many programming languages. A regular expression is a string of special characters that describe the rules for matching a series of characters. Regular expressions provide the basis for advanced text pattern matching and extraction, and/or text-based search and replacement. The re module is used in Python to implement regular expressions.

3.5.2 Regular Expression Execution Process The regular expression matching process is as follows: Match the characters in the text with the regular expression in sequence. If all characters can be matched, the matching is successful. If any character fails to be matched, the matching fails.

02 Python Basics (Textbook)

70

Python Basics

Page 27

3.5.3 Common Matching Methods of the re Module Table 3-1 Common matching methods of the re module Function/Method

Description

Examples

compile(pattern,fl ag=0)

Compiles the regular expression pattern with any optional flag and returns the regular object.

res = re.compile(".*")

match(pattern,stri ng,flag=0)

Checks for a match from the beginning of a string.

res = re.match(".*","abcd xxxx")

abcd

search(pattern,stri ng,flag=0)

Checks for a match anywhere in the given string and returns the result if found.

res = re.search(".*", "xxxabcdxx")

abcd

findall(pattern,stri ng,flag=0)

Searches for all regular expression patterns in a string and returns a list.

res = re.findall("a", "abdadafaf")

['a','a','a','a']

split(pattern,strin g,max=0)

Splits a string into lists based on the regular pattern.

re.split(",","li,yang, zhao")

['li','yang','zhao' ]

sub(pattern,repl,st ring,count=0)

Replaces the regular expression in a string by repl.

res = re.sub(",","","l,y,z")

l-y-z

print res.search("abcd"). group()

res.group()/res

abcd

3.5.4 Common Methods for Match Object Instances Table 3-2 Common methods for match object instances Function/Method

Description

Examples

Result

group(num=0)

Returns the entire match object, or a specific

print(re.match(".*", "abcdxxxx").group() )

abcdxxxx

02 Python Basics (Textbook)

71

Python Basics

Function/Method

Page 28

Description

Examples

Result

print(re.search("(\w \w\w)(\d\d\d)","abc123").groups())

('abc', '123')

subgroup whose number is num.

groups(default=None)

Returns a tuple containing all subgroups.

groupdict(default=None)

Returns a dictionary containing all named subgroups of the match, keyed by the subgroup name.

re.i,re.IGNORECASE

The value is case insensitive.

res = re.search("(?P\w\w\w)(?P\d\d\d)"," abc-123")

{'lamb':'abc', 'num': '123'}

Print(str(res.groupdi ct())) res =re.search("abc","aB cxx",re.I)

aBc

print(res.group())

re.M,re.MULTILINE

In this mode, ^ and $ match the start and end of the target string respectively, but not the start and end of any line within the target string.

res = re.search("^aB","aBc xx",re.M)

aB

print(res.group())

3.5.5 Special Symbols and Characters Table 3-3 Special symbols and characters Representation

Description

Matched Expression

res.group()

re1|re 2

Matches the regular expression re1 or

res=re.search("foo|bar", "xxxfooxxx")

foo

02 Python Basics (Textbook)

72

Python Basics

Representation

Description

Page 29

Matched Expression

res.group()

re2.

.

Matches any character (except \n).

res=re.search("b.b", "xxxbobxxx")

bob

^

Matches the start of a string.

res=re.search("^b.b", "bobx xx")

bob

$

Matches the end of a string.

res=re.search("b.b$","xx xbob")

bob

res= re.search("bob*","bobbo")

*

Matches a regular expression that appears zero or multiple times (matching from the start of the string).

+

Matches a regular expression that appears one or multiple times.

res= re.search("bob+","xxxxbo bbbbob")

bobbbb

?

Matches a regular expression that appears zero or one time.

res=re.search("bob?","bob bod")

bob

{N}

Matches a regular expression that appears N times.

res=re.search("bob{2}","b obbbbod")

bobb

{M,N}

Matches a regular expression that appears M to N times.

res=re.search("bob{2,3}"," bobbbbod")

bobbb

[…]

Matches any single character from the character set.

res= re.search("[b,o,b]","xxbob xx")

b

[..X-Y..]

Matches any single character within the

res= re.search("[a-

x

02 Python Basics (Textbook)

res1= re.search(".*","bobboddd" )

Bobb bobboddd

73

Python Basics

Representation

Page 30

Description

Matched Expression

range from x to y.

z]","xxbobxx")

[^…]

Does not match any character in the string, including characters within a specific range.

res= re.search("[^az]","xx214bobxx")

(*|+|{})?

Matches the nongreedy version of a specific character that appears frequently or repeatedly.

res1=re.search("[^2,x,1]"," xx214bobxx")

res= re.search(".[+|?]?[19]","ds4b")

res.group()

2 4

s4

3.6 Generators, Iterators, and Decorators 3.6.1 Iterators Iterators are used to access elements of a set. An iterator is an object that remembers the positions of all elements of a set. An iterator object accesses the elements of a set from the beginning to the end. Iterators can only iterate forward, not backward. An iterable object is an object that implements the __iter__ method that returns an iterator, or an object that defines the __getitem__ method that supports subscript indexes. Use the isinstance(obj, Iterable) function to determine whether an object is an iterable object based on the return value. Two basic methods of iterators: next(): Outputs the next element of the iterator. When all data is iterated, a StopIteration exception will be thrown if the next() method is used again. iter(): Creates an iterator object.

3.6.2 Generators A generator is a special kind of iterator. A generator can return one or more values each time it iterates, and it can record the current state. Generator creation methods: Use the yield keyword. Use the generator expression (derivation).

02 Python Basics (Textbook)

74

Python Basics

Page 31

The execution of a program suspends when the generator statement is encountered. The execution resumes only when the next() or send() method is used. When the send() method is used, data is transferred.

3.6.3 Closures A closure is an entity consisting of functions and their associated reference environments. In Python, if an internal function references a variable in an external scope (not a global scope), the internal function is considered as a closure. You cannot modify a local variable in an external scope in a closure. Simple implementation of a closure: def func(): n=1 def inner(): return n+1 return inner

3.6.4 Decorators A decorator is essentially a Python function that adds new functions to the code without changing the code structure. The working process of a decorator is as follows: Transfer the function to be decorated as a parameter to the decorator function (function with the same name), and return the decorated function. Decorator is an application of closures. Usage of decorators: @ Decorator function def func(): pass

3.7 Extension 3.7.1 JSON JavaScript Object Notation (JSON) is a lightweight data exchange format that is easy to read and write. Its form is similar to that of a dictionary in Python. To use JSON functions in Python, you need to import the JSON library using the import json statement. json.dumps: Encodes a Python object into a JSON string. json.loads: Decodes an encoded JSON string into a Python object.

02 Python Basics (Textbook)

75

Python Basics

Page 32

3.7.2 Metaclasses In Python, a class itself is an object, and the class that creates such an object is called a metaclass. You can use type() to create metaclasses in Python. type(name, base, dict), in which name indicates the class name, base indicates the tuple of classes from which the current class derives (which is used in inheritance scenarios and can be left empty), and dict indicates the dictionary that contains attributes, including names and values. Metaclasses of all classes are created using type(). Metaclasses are used to create APIs. Everything in Python is an object, either an instance of a class or an instance of a metaclass, except the type class. The type class is its own metaclass.

3.7.3 Garbage Collection Mechanism in Python In python, interpreters are responsible for memory management, saving developers' time and workloads. The garbage collection mechanism in Python uses the reference counting technique to trace and collect garbage. On the basis of reference counting, you can also solve the circular reference problem that may be generated by container objects by using the mark-and-sweep method. Generation collection improves the efficiency of garbage collection at the cost of using more storage space.

02 Python Basics (Textbook)

76

Python Basics

Page 33

4

Quiz

4.1 Short Answer Questions 1.

If two variables have the same value, are their storage addresses in the computer the same?

2.

What is the simplest way to delete duplicate elements from a list?

3.

What if two lists take each other as the parameter of the append method? (a.append(b);b.append(a))

4.2 Multiple-Choice Questions 1.

2.

Which of the following statements is incorrect? ( ) A.

program can contain multiple processes.

B.

process can have 10 threads.

C.

program can have no process but only threads.

D.

Thread synchronization can be implemented using locks.

Which of the following statements about Python database programming is correct? ( ) A.

Python can operate only the MySQL database.

B.

Python 3 uses PyMySQL for database connection.

C.

Python 3 uses MySQLdb for database connection.

D.

The procedure for operating a database in Python is as follows: Enable the database connection, obtain the cursor, execute SQL statements, and disable the database connection.

02 Python Basics (Textbook)

77

Huawei AI Academy Training Materials

Machine Learning

Huawei Technologies Co., Ltd.

03 Machine Learning (Textbook)

78

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees, or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express, or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang, Shenzhen 518129

Website:

http://e.huawei.com

03 Machine Learning (Textbook)

79

Machine Learning

Page 3

Contents 1 Machine Learning .............................................................................................................................. 4 1.1 Machine Learning Algorithms ............................................................................................................................................. 4 1.1.1 Overview .................................................................................................................................................................................. 4 1.1.2 Rational Understanding of Machine Learning Algorithms .................................................................................... 6 1.1.3 Main Problems Solved by Machine Learning ............................................................................................................. 7 1.2 Machine Learning Classification ......................................................................................................................................... 7 1.2.1 Supervised Learning ............................................................................................................................................................. 7 1.2.2 Unsupervised Learning........................................................................................................................................................ 9 1.2.3 Semi-supervised Learning ................................................................................................................................................10 1.2.4 Reinforcement Learning ...................................................................................................................................................11 1.3 Machine Learning Process...................................................................................................................................................12 1.3.1 Overview ................................................................................................................................................................................12 1.3.2 Data Collection ....................................................................................................................................................................13 1.3.3 Data Cleansing .....................................................................................................................................................................14 1.3.4 Feature Selection ................................................................................................................................................................15 1.3.5 Overall Procedure of Building a Model ......................................................................................................................17 1.3.6 Model Evaluation ................................................................................................................................................................18 1.4 Parameters and Hyperparameters in Models ..............................................................................................................23 1.4.1 Gradient Descent ................................................................................................................................................................24 1.4.2 Validation Set and Hyperparameter Search .............................................................................................................25 1.4.3 Cross Validation ...................................................................................................................................................................26 1.5 Common Machine Learning Algorithms ........................................................................................................................27 1.5.1 Overview ................................................................................................................................................................................27 1.5.2 Linear Regression ................................................................................................................................................................28 1.5.3 Logistic Regression .............................................................................................................................................................30 1.5.4 Decision Tree ........................................................................................................................................................................32 1.5.5 SVMs ........................................................................................................................................................................................34 1.5.6 KNN..........................................................................................................................................................................................35 1.5.7 Naive Bayes ...........................................................................................................................................................................37 1.5.8 Ensemble Learning .............................................................................................................................................................38 1.5.9 Clustering Algorithm..........................................................................................................................................................41 1.6 Case Study .................................................................................................................................................................................42 1.7 Summary ...................................................................................................................................................................................45 1.8 Quiz .............................................................................................................................................................................................46

03 Machine Learning (Textbook)

80

Machine Learning

1

Page 4

Machine Learning

Machine learning is currently a mainstream research direction in the field of artificial intelligence (AI), involving multiple disciplines such as probability theory, statistics, and convex optimization. This chapter describes the definition of machine learning algorithms, the machine learning process, common machine learning algorithms, and the concepts such as hyper parameters, gradient descent, and cross validation.

1.1 Machine Learning Algorithms 1.1.1 Overview Machine learning (including deep learning) is the study of learning algorithms. A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. For example, identifying spam emails is a task T. We can easily finish the task because we have accumulated a lot of experience in our daily life. The experience may cover emails, spam messages, and even TV ads. By summarizing the experience, we find that emails that are sent from unknown users and contain words such as "discount" and "zero risk" are more likely to be spam emails. Based on the knowledge of spam emails, we can determine whether an unread email is a spam email, as shown in the left part in Figure 1-1. So can we write a computer program to simulate the above process? As shown in the right part in Figure 1-1, we can prepare a large number of emails and manually filter out spam ones as the computer program experience E. However, the computer program cannot automatically summarize the experience. In this case, a machine learning algorithm is needed to train the computer program. A trained computer program is called a model. Generally, a larger number of emails used for training indicates a better trained model, that is, a larger value of the performance measure P.

03 Machine Learning (Textbook)

81

Machine Learning

Page 5

Figure 1-1 Machine learning mode It is very difficult to identify spam emails by using conventional programming methods. In theory, we should be able to find a set of rules that are met by any spam emails but not by normal emails. This approach to problem solving using explicit programming is called a rule-based approach. In practice, it is almost impossible to find such a set of rules. Therefore, a statistics-based approach is used by machine learning to solve problems. As we all know, machine learning is an algorithm that enables machines to automatically learn rules based on samples. Compared with the rule-based approach, statistics-based approach can learn more complex rules or rules difficult to describe, and therefore can process more complex tasks.

Figure 1-2 Application scenarios of machine learning algorithms Machine learning algorithms are highly expressive and can solve many problems in the AI field. However, this does not mean that machine learning is the first choice in any case.

03 Machine Learning (Textbook)

82

Machine Learning

Page 6

As shown in 0, machine learning applies to complex solutions or scenarios involving a large amount of data and unknown data probability distribution. It is also applicable to other scenarios but often generates higher costs than conventional methods. Take the second quadrant in 0 as an example. If the problem has a scale small enough for the problem to be manually solved, you do not need to use machine learning algorithms. Generally, machine learning is applicable to: 

Scenarios with complex rules or rules difficult to describe, for example, facial recognition and speech recognition.



Scenarios with data distribution changing over time and constant readaptation of programs required, for example, sales trend forecast.

1.1.2 Rational Understanding of Machine Learning Algorithms The essence of machine learning algorithms is function fitting. Assuming that f is a target function, the objective of machine learning algorithms is to output a hypothesis function g, so that g(x) approaches f(x) as far as possible for the input x in any definition domain. A simple example is the probability density estimation in statistics. According to the law of large numbers, the height of all Chinese should be subject to a normal distribution. Although the probability density function f of the normal distribution is unknown, we can estimate the mean and variance of the distribution by using the sampling method, and then estimate f.

Figure 1-3 Relationship between the hypothesis function and target function This practice also applies to a more complex situation, as shown in 0. For a given task, we can collect a large amount of training data. The data must satisfy a certain target function f. Otherwise, it is meaningless to learn such a task. Machine learning algorithms can provide, by analyzing the training data, a hypothesis function g that is as similar to the target function f as possible. Therefore, the output of machine learning algorithms cannot be the same as the target function. However, with increasing training data, the hypothesis function g gradually approaches the target function f to achieve satisfactory precision. Notably, the existence of the target function f is sometimes highly abstract. For a typical image classification task, the target function is a mapping from an image set to a category set. To enable a computer program to process logical information such as images and categories, you need to map the images or categories to a scalar, a vector, or a matrix in a particular encoding manner. For example, you can assign a sequence number starting with 0 to each category to map the category to a scalar. Different onehot vectors can also be used to represent different categories, and this manner is referred

03 Machine Learning (Textbook)

83

Machine Learning

Page 7

to as one-hot encoding. The image encoding mode is slightly complex, and is generally represented by a three-dimensional matrix. With this encoding mode, we can consider the definition domain of the target function f as a set of three-dimensional matrices, and its value range as a set of a series of label numbers. Although the encoding process is not a part of machine learning algorithms, in some cases, the selection of encoding mode also affects efficiency of machine learning algorithms.

1.1.3 Main Problems Solved by Machine Learning Machine learning algorithms can deal with many types of tasks. The most typical types of tasks are classification, regression, and clustering. Classification and regression are major types of prediction tasks, accounting from 80% to 90%. The output of classification is discrete category labels, and the output of regression is continuous numbers. A classification task requires a computer program to specify a specific one of the k categories for the input. To accomplish this task, machine learning algorithms usually output a mapping from the definition domain D to the category labels {1, 2, ..., k}. The image classification algorithm mentioned above deals with classification tasks. In a regression task, a computer program needs to predict the output for the given input. The output of a machine learning algorithm is usually a mapping from the definition domain D to the real number domain R. An example of this task type is to predict the claim amount of an insured person (to set the insurance premium) or predict the security price. Classification tasks can also be classified as regression tasks. You can obtain the classification result by predicting the probability that an image belongs to each category. A clustering task divides data into multiple categories based on the internal similarity of the data. Different from a classification task, the dataset of a clustering task does not contain a manually added category label. Clustering algorithms try to achieve higher data similarity in the same category than that between different categories, so as to implement classification. Clustering algorithms can be applied to scenarios such as image retrieval and user profiling.

1.2 Machine Learning Classification Machine learning can be classified into supervised learning and unsupervised learning. The training data used for supervised learning contains manually added labels, while that of unsupervised learning does not. If some data in a dataset contains labels but most data does not, this type of machine learning is called semi-supervised learning. Reinforcement learning focuses on multi-step decision-making and automatically collects data for learning during interaction with the environment.

1.2.1 Supervised Learning Figuratively speaking, supervised learning allows a computer to compare its answers with standard ones when handling multiple-choice questions. The computer tries to adjust its model parameters, trying to make the predicted answers as consistent as possible with the standard ones, and finally learns how to complete the task. Supervised learning can train an optimal model with required performance based on samples with known labels. This trained model can map input to output to predict unknown data.

03 Machine Learning (Textbook)

84

Machine Learning

Page 8

Figure 1-4 Supervised learning algorithm 0 shows the functions of supervised learning algorithms in a highly simplified manner. The features mentioned in the figure can be simply understood as data items. Although this understanding is incomplete in some sense, it does not affect the description of supervised learning algorithms. A supervised learning algorithm takes features as input and the predicted value of the target as output. 0 provides an example. In this example, whether a user enjoys sports is predicted based on the weather. A similar algorithm can be applied to scenarios such as product recommendation. Each row in the table is a training sample, which records a weather feature of a specific day and whether the user enjoys sports.

Figure 1-5 Example data The input (features) and output (targets) of supervised learning algorithms can be continuous or discrete. When the value of the target variable is continuous, the output of a supervised learning algorithm is called a regression model. A regression model reflects the features of attribute values of samples in a sample dataset. A function is used to express the sample mapping relationship and further discover the dependency between attribute values. The attribute values include features and target. Regression models are widely used in time series forecasting, for example, predicting how much profit stocks can bring in the next week and what is the temperature in Celsius tomorrow. Correspondingly, when the target variable takes a discrete value, the output of the learning algorithm is referred to as a classification model. You can map samples in a sample dataset to a given category by using a classification model, for example, whether a traffic jam will occur on a highway during tomorrow's morning rush hours, and which of a CNY 5 voucher and a 25% discount will attract more customers.

03 Machine Learning (Textbook)

85

Machine Learning

Page 9

Although the value range of a regression model can be an infinite set, the output of a classification model is usually limited. This is because the size of a dataset cannot increase infinitely, and the number of categories in the dataset is the same as the number of training samples at most. Therefore, the number of categories cannot be infinite. When a classification model is trained, a category set L usually needs to be manually specified for the model to select a category for output. The size of the category set L is generally denoted as K, which indicates the number of possible categories.

1.2.2 Unsupervised Learning Compared with supervised learning, unsupervised learning is like letting a computer handle multiple-choice questions without telling it the right answers. In this case, it is difficult for the computer to give correct answers. However, by analyzing the relationship between these questions, the computer can classify the questions so that the multiple choice questions in each category have the same answer. As shown in 0, unsupervised learning algorithms do not require sample labeling, but directly model the input datasets.

Figure 1-6 Unsupervised learning algorithm The clustering algorithm is a typical unsupervised learning algorithm. The algorithm only needs to put things with a high similarity together. For a new sample, you only need to calculate the similarity between the new sample and an existing sample, and then classify the new sample based on the similarity. Biologists have long been using the idea of clustering to study interspecies relationships. As shown in Figure 1-7, after the iris is drawn on a two-dimensional plane based on the sepals and petals, the iris is obviously divided into three clusters. Samples in a sample dataset are classified into several categories based on the clustering model. Samples belonging to the same category have high similarity. The application scenarios of the clustering model are as follows: which audience likes to watch movies of the same subject, and which of these components are damaged in a similar way.

03 Machine Learning (Textbook)

86

Machine Learning

Page 10

Figure 1-7 Clustering algorithm example

1.2.3 Semi-supervised Learning Semi-supervised learning is a fusion of supervised learning and unsupervised learning. It is a machine learning task that automatically uses a large amount of unlabeled data to assist learning of a small amount of labeled data. In a conventional supervised learning algorithm, a computer program needs to learn a large number of labeled training samples to build a model for predicting the labels of new samples. For example, in a classification task, a label indicates the category of a sample while in a regression task, a label is a real-value output of the sample. As our data collection and storage capabilities are developing, we can easily gain a large amount of unlabeled data from many tasks. However, labeling the data is labor-consuming and time-consuming. For example, a user needs to mark interested websites for website recommendation. However, few users are willing to spend a lot of time marking websites. Therefore, the number of marked websites is small. There are countless websites on the Web that can be used as unmarked data.

Figure 1-8 Semi-supervised learning algorithm As shown in 0, semi-supervised learning does not require manual labeling of all samples as supervised learning does, nor is semi-supervised learning completely independent from targets as unsupervised learning does. In a dataset used for semi-supervised learning, only a few samples are labeled. The iris classification task shown in Figure 1-7 is used as an example. A small amount of supervision information is added to the dataset, as shown in 0. Red objects indicate Setosa samples, green objects indicate Versicolor samples,

03 Machine Learning (Textbook)

87

Machine Learning

Page 11

purple objects indicate Virginica samples, and gray objects indicate unknown samples. Assume that the output of the clustering algorithm, which has been introduced in unsupervised learning, is shown in the gray dashed circle in the figure. Collect statistics on the number of samples of each category in these circles, and use the category with the largest number of samples as the cluster category. For example, the cluster in the upper left corner belongs to Setosa, and the cluster in the upper right corner belongs to Virginica. By combining unsupervised learning algorithms with supervision information, semi-supervised learning algorithms can bring higher accuracy with lower labor cost.

Figure 1-9 Iris dataset with supervision information

1.2.4 Reinforcement Learning Reinforcement learning is mainly used to solve multi-step decision-making problems in scenarios such as chess, electronic games, and visual navigation. Different from the problems studied in supervised learning and unsupervised learning, it is often difficult to find accurate answers for multi-step decision-making problems. Taking chess as an example, it takes about 10170 operations to exhaust the game's results (There are no more than 1080 atoms in the universe.). So for a given situation, it is generally difficult to find the perfect move. Another feature of multi-step decision-making problems is that it is easy to define a reward function to evaluate task completion. The reward function of chess can be defined as whether to win the game. The reward function for electronic games can be defined as a score. The goal of reinforcement learning is to find an action strategy 𝜋 to maximize the value of the reward function.

03 Machine Learning (Textbook)

88

Machine Learning

Page 12

Figure 1-10 Reinforcement learning algorithm As shown in 0, the two most important parts of a reinforcement learning algorithm are the model and environment. In different environments, the model can determine its own actions, and different actions may have different impacts on the environments. For example, a computer can give the answer of a question at will, and the teacher will rate the answer. If that is the only case, the computer will not have learned how to do questions, because the teacher's grades do not apply to the training process. In this case, the importance of status as well as reward and punishment is reflected. A higher test score can make the teacher satisfied, thus giving the computer a certain reward. Conversely, a lower test score could be a penalty for the computer. As a progressive computer, it is certain to adjust its own model parameters so that the answers given by itself will receive more rewards. In this process, no one provides training data for machine learning algorithms, or tells the reinforcement learning system how to generate the correct action. All data and reward signals are dynamically generated and learned during the interaction between the model and the environment. Both good and bad behaviors can help model learning.

1.3 Machine Learning Process 1.3.1 Overview A complete machine learning project includes data collection, data cleansing, feature extraction and selection, model training, model evaluation, as well as model deployment and integration, as shown in 0. This section first describes concepts related to data and data cleansing. These concepts are the basis for understanding feature selection. After selecting proper features, you need to train and evaluate a model based on these features. The entire process can be completed only after continuous feedback and iterations to achieve satisfactory results. Finally, you need to further deploy the model in specific application scenarios for practice.

03 Machine Learning (Textbook)

89

Machine Learning

Page 13

Figure 1-11 Machine learning process

1.3.2 Data Collection A dataset is a collection of data used in machine learning tasks. Each piece of data is called a sample. Events or attributes that reflect the performance or nature of a sample in a particular aspect are called features. A training set is a dataset used in the training process, where each sample is referred to as a training sample. Learning (training) is the process of creating a model from data. The process of using a model for prediction is called testing, and the dataset used is called a test set. Each sample in the test set is called a test sample.

Figure 1-12 Dataset example 0 shows a typical dataset. In this dataset, each row represents one sample, and each column represents one feature or label. After a task (for example, predicting house prices based on the areas, school districts, and directions) is determined, the features and labels are determined. Therefore, the table header of a dataset cannot be changed throughout a machine learning project. The splitting of the training set and test set is relatively flexible. Researchers can determine which samples belong to the training set based on experience. If the proportion of the test set is too low, model testing will be excessively random. As a result, the performance of the model cannot be properly evaluated. If the proportion of the training set is too low, the sample utilization may be low and model learning is insufficient. Therefore, it is recommended that the training set account for

03 Machine Learning (Textbook)

90

Machine Learning

Page 14

80% of the total number of samples, and the test set account for 20%. In this example, there are four samples in the training set and one sample in the test set.

1.3.3 Data Cleansing Data is crucial to models. It is the ceiling of model capabilities. Without good data, there is no good model. However, real data may have some quality problems, as shown in 0. Typical data quality problems include: (1) Incompleteness: contains missing values or the data that lacks attributes. (2) Noise: contains incorrect records or exceptions. (3) Inconsistency: contains inconsistent records. Such data is called dirty data. The process of filling in missing values, as well as detecting and eliminating exceptions is called data cleansing. In addition, data preprocessing usually includes data dimension reduction and data normalization. The purpose of data dimension reduction is to simplify data attributes to prevent dimension explosion. The purpose of data normalization is to normalize the dimensions of each feature to reduce the training difficulty. This section describes only data cleansing.

Figure 1-13 Dirty data Most machine learning models process features, which are usually numeric representations of input variables that can be used in the model. In most cases, collected data can be used by algorithms only after being preprocessed. The preprocessing operations include: 

Data filtering



Processing of lost data



Processing of possible exceptions, errors, or abnormal values



Combination of data from multiple data sources



Data consolidation

The workload of data cleansing is usually heavy. Research shows that cleansing and organizing data account for 60% of data scientists' time in machine learning research, as shown in 0. On the one hand, this shows the difficulty of data cleansing: Different ways and contents of data collection require different methods of data cleansing. On the other hand, this also shows that data cleansing plays an important role in subsequent model

03 Machine Learning (Textbook)

91

Machine Learning

Page 15

training and optimization: If data is thoroughly cleaned, the model is less susceptible to interference from abnormal data, ensuring the model training effect.

Figure 1-14 Statistics on data scientists' work in machine learning

1.3.4 Feature Selection Generally, a dataset has many features, some of which may be redundant or irrelevant to the target. For example, in the task of predicting house prices based on the area, school district, and temperature of the day, the temperature of the day is obviously an irrelevant feature. Feature selection filters out redundant or irrelevant features, simplifying models and making them easier for users to interpret. In addition, feature selection can effectively reduce the model training time, prevent dimension explosion, improve model generalization performance, and prevent overfitting. The common methods for feature selection include filter, wrapper, and embedded.

Figure 1-15 Filter method Filter methods are independent of models during feature selection. By evaluating the correlation between each feature and the target attribute, these methods use a statistical measure to score each feature. Features are then sorted by score, which is helpful for preserving or eliminating specific features. 0 shows the machine learning process using a

03 Machine Learning (Textbook)

92

Machine Learning

Page 16

filter method. Statistical measures commonly used in filter methods include Pearson correlation coefficient, chi-square coefficient, and mutual information. Because filter methods do not consider the relationship between features, they only tend to filter out redundant variables.

Figure 1-16 Wrapper method Wrapper methods use a prediction model to score feature subsets. Consider feature selection as a search issue, in which wrappers use a prediction model to evaluate and compare different feature combinations. Higher accuracy of the prediction model indicates that the feature combination should be retained. 0 shows the machine learning process using a wrapper method. Common wrapper methods include the recursive feature elimination (RFE). Wrapper methods usually provide feature sets with the best performance for a specific type of models, but need to train a new model for each feature subset, which can be computationally expensive.

Figure 1-17 Embedded method Embedded methods consider feature selection as a part of model building, as shown in 0. Unlike filter and wrapper methods, models using embedding methods dynamically learn how to select features during training. The most common type of embedded feature

03 Machine Learning (Textbook)

93

Machine Learning

Page 17

selection methods is regularization methods. Regularization methods are also called penalization methods that introduce additional constraints into the optimization of a predictive algorithm to bias the model toward lower complexity and reduce the number of features. Common regularization methods include Ridge regression and Lasso regression.

1.3.5 Overall Procedure of Building a Model After data cleansing and feature extraction, you need to start building the model. 0 shows how to build a model, using supervised learning as an example. The core of model building is model training, verification, and testing. This section briefly describes the training and prediction process based on an example.

Figure 1-18 Model building procedure In the example in this section, we use a classification model to predict whether a person needs to change the supplier for a particular feature. Assume that the dataset in 0 is the cleaned dataset. The task of the model is to predict the target as accurately as possible based on the known features. During the training, the model can learn the mapping relationship between features and the target based on samples in the training set. After training, we may obtain the following model: Code: def model(city, age): if city == "Miami": return 0.7 if city == "Orlando": return 0.2 if age > 42: return 0.05 * age + 0.06 else: return 0.01 * age + 0.02

03 Machine Learning (Textbook)

94

Machine Learning

Page 18

The output of the model is the probability that the target is the true value. As we know, the model accuracy increases as the training data increases. So why not use all the data for training, but use part of it as test set? This is because we are concerned about the performance of the model in the face of unknown data, not known data. It can be understood that the training set is like the question bank that students studied when preparing for an exam. No matter how high the accuracy rate of students in the question bank is not surprising, because the question bank is always limited. As long as the students' memory is good enough, all the answers can be memorized. Only through an examination can we really check the students' mastery of knowledge, because the questions appear in the examination are never seen by the students. The test set is equivalent to a test paper prepared by the researcher for the model. That is, in the entire dataset (including the training set and test set), the model can read only the features of the training set and test set. The targets of the test set can only be used by the researcher to evaluate the performance of the model.

Figure 1-19 Training set and test set

1.3.6 Model Evaluation What Is a Good Model? The most important evaluation indicator is the generalization capability of a model, that is, the prediction accuracy of a model regarding actual service data. In addition, some engineering indicators can also be used to evaluate a model. Interpretability describes the intuitive degree of the prediction result of a model. The prediction rate refers to the average time for a model to predict each sample. Plasticity refers to the degree to which the prediction rate can be accepted with the increase of the service volume in the actual service process. The goal of machine learning is that the model obtained after learning should perform well on new samples, not just on samples used for training. The capability of applying a model to new samples is called generalization or robustness. The difference between the sample result predicted by the model obtained after learning and the actual sample result is called an error. The training error refers to the error that you get when you run the model on the training set, and the generalization error refers to the error that you

03 Machine Learning (Textbook)

95

Machine Learning

Page 19

get when you run the model on new samples (test set). Obviously, we prefer a model with a smaller generalization error. Once the form of a model is given, all possible functions constitute a space, which is hypothesis space. Machine learning algorithms are searching for a suitable fitting function in a hypothesis space. If the mathematical model is too simple or the training time is too short, the training error of the model will be large. This phenomenon is called underfitting. For the former cause, a more complex model needs to be used for retraining. For the latter cause, the underfitting phenomenon can be effectively alleviated only by prolonging the training time. However, to accurately determine the causes of underfitting often requires certain experience and methods. On the contrast, overfitting refers to the phenomenon that the training error of a model is very small (because the model is complex) but the generalization capability is weak, that is, the generalization error is relatively large. There are many ways to mitigate overfitting. The common ones are as follows: appropriately simplifying the model; ending training before overfitting occurs; using the Dropout and Weight Decay methods. 0 shows the underfitting, good fitting, and overfitting results for the same dataset.

Figure 1-20 Underfitting, good fitting, and overfitting Model capacity, also called model complexity, refers to model's capability of fitting functions. When the capacity suits the task complexity and the amount of training data provided, the algorithm effect is usually optimal. Models with insufficient capacity cannot solve complex tasks, and underfitting may occur. As shown in the left part in 0, the data distribution is in a tick shape, but the model is linear and cannot describe the data distribution well. A high-capacity model can solve complex tasks, but overfitting may occur if the capacity is higher than that required by a task. As shown in the right part in 0, the model attempts to fit data with an extremely complex function. Although the training error is reduced, it is conceivable that such a model cannot well predict the target value of a new sample. The effective capacity of a model is limited by algorithms, parameters, and regularization methods. In general, the generalization error can be broken down into the following forms:

Bias and variance are two subforms that we should pay attention to. As shown in 0, variance is the offset of the prediction result from the average value, and is the error

03 Machine Learning (Textbook)

96

Machine Learning

Page 20

caused by the model's sensitivity to small fluctuations in the training set. Bias is the difference between the average prediction value and the correct value we are trying to predict. Unresolvable errors refer to errors caused by imperfections of models and finiteness of data. Theoretically, if there is infinite amount of data and a perfect model, the error can be eliminated. However, there is no such situation in practice, so the generalization error can never be eliminated.

Figure 1-21 Variance and bias Ideally, we want a model that can accurately capture the rules in the training data and summarize the invisible data (new data). However, it is usually impossible for the model to complete both tasks at the same time. As shown in 0, the training error decreases as the model complexity increases. As the model complexity increases, the test error decreases to a certain point and then increases in the reverse direction, forming a convex curve. The model complexity at the lowest test error point is the ideal model complexity.

03 Machine Learning (Textbook)

97

Machine Learning

Page 21

Figure 1-22 Relationship between model complexity and error The commonly used counters for evaluating the performance of a regression model are mean absolute error (MAE), mean square error (MSE), and 𝑅 2. Assume that the actual target value of the test sample is 𝑦1 , 𝑦2 , … , 𝑦𝑚 and the corresponding predicted value is 𝑦̂1 , 𝑦̂2 , … , 𝑦̂𝑚 . The preceding counters are defined as follows: 𝑚

1 𝑀𝐴𝐸 = ∑ | 𝑦𝑖 − 𝑦̂𝑖 | 𝑚 𝑖=1 𝑚

1 𝑀𝑆𝐸 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2 𝑚 𝑖=1

∑𝑚 𝑅𝑆𝑆 ̂𝑖 )2 𝑖=1(𝑦𝑖 − 𝑦 𝑅 =1− =1− 𝑚 ∑𝑖=1(𝑦𝑖 − 𝑦̄ 𝑖 )2 𝑇𝑆𝑆 2

TSS indicates the difference between samples, and RSS indicates the difference between the predicted value and sample value. The values of MAE and MSE counters are nonnegative. A value closer to 0 indicates better model performance. The value of 𝑅 2 is not greater than 1. A value closer to 1 indicates better model performance.

Figure 1-23 Binary-classification confusion matrix The confusion matrix is used to evaluate the performance of a classification model, as shown in 0. The confusion matrix is a k-dimensional square matrix, where k represents the quantity of all categories. The value in row i and column j in the table indicates the number of samples that belong to category i but are determined as category j by the model. Ideally, for a high accuracy classifier, most samples should be located in the

03 Machine Learning (Textbook)

98

Machine Learning

Page 22

diagonal of the table while values outside the diagonal are 0 or close to 0. Each symbol in the binary-classification confusion matrix shown in 0 is described as follows: (1) P: positive, indicating the number of real positive cases in the data. (2) N: negative, indicating the number of real negative cases other than P in the data. (3) TP: true positive, indicating the number of positive cases that are correctly classified by the classifier. (4) TN: true negative, indicating the number of negative cases that are correctly classified by the classifier. (5) FP: false positive, indicating the number of positive cases that are incorrectly classified by the classifier. (6) FN: false negative, indicating the number of negative cases that are incorrectly classified by the classifier. 0 lists other concepts in the binary-classification confusion matrix.

Figure 1-24 Other concepts of a confusion matrix The following describes the concepts of precision rate and recall rate by taking literature retrieval as an example. The precision rate describes the proportion of documents that are really related to the search subject in all the documents that are retrieved. The recall rate describes the proportion of the retrieved documents related to the retrieval subject to all the related documents in the document library. At the end of this section, we use an example to describe the calculation of the binaryclassification confusion matrix. Assume that a classifier that can identify whether the object in an image is a cat, and now we use 200 images to verify the performance measures of the model. Among the 200 images, objects in 170 images are cats, while others are not. 0 shows the performance of the model. The identification result of the model is that objects in 160 images are cats, while others are not. The precision rate of

03 Machine Learning (Textbook)

99

Machine Learning

Page 23

the model can be calculated as follows: 140/160 = 87.5%; the recall rate is 140/170 = 82.4%; the accuracy rate is (140 + 10)/200 = 75%.

Figure 1-25 Confusion matrix instance

1.4 Parameters and Hyperparameters in Models Parameters are a part of a model that is learned from historical training data and key to machine learning algorithms. Generally, model parameters are not manually set by researchers, but obtained from data estimation or learning. Determining parameter values of a model is equivalent to defining the functions of the model. Therefore, model parameters are usually saved as a part of the learning model. Parameters are also an integral part of model prediction. Some examples of model parameters include weights in artificial neural networks, support vectors in support vector machines (SVMs), and coefficients in linear or logistic regression. A model contains not only parameters but also hyperparameters. Different from a parameter, a hyperparameter is an external configuration of a model, and is usually used in the process of estimating a model parameter. The most fundamental difference between them is that parameters are automatically learned by a model, while hyperparameters are manually set. Model hyperparameters usually need to be adjusted for different prediction modeling problems. In addition to being directly specified by researchers, model hyperparameters can also be set using a heuristic method. Common model hyperparameters include the penalty item coefficient in Lasso/Ridge regression, the learning rate, quantity of iterations, batch size, activation function, and quantity of neurons in a training neural network, as well as 𝐶 and 𝜎 of SVMs, K in KNN, the quantity of decision tree models in a random forest. Model training generally refers to optimizing model parameters, and this process is completed by using the gradient descent algorithm. Based on the training effect of the model, you can use a series of hyperparameter search algorithms to optimize the hyperparameters of the model. This section successively describes gradient descent

03 Machine Learning (Textbook)

100

Machine Learning

Page 24

algorithms, the concept of validation set, and hyperparameter search and cross validation.

1.4.1 Gradient Descent A gradient descent algorithm uses the negative gradient direction of the current position as the search direction, which is the steepest direction. See the left part in 0. The gradient descent formula is as follows: 𝑤𝑘+1 = 𝑤𝑘 − 𝜂𝛻𝑓𝑤𝑘 (𝑥) In the formula, 𝜂 indicates the learning rate, and w represents a parameter of the model. As w approaches the target value, a variation amount of w gradually decreases. When the value of the target function changes little or the maximum number of iterations of gradient descent is reached, algorithm convergence occurs. It should be noted that when the gradient descent algorithm is used to calculate the minimum value of the non-convex function, different initial values may lead to different results, as shown in the right part in 0.

Figure 1-26 Gradient descent algorithm You can apply one of the following gradient descent algorithms to model training:  Batch gradient descent (BGD): uses the samples in all datasets to update the weight parameter based on the average gradient value at the current point.  Stochastic gradient descent (SGD): randomly selects a sample from a dataset to update the weight parameter based on the gradient value of the sample.  Mini-batch gradient descent (MBGD): combines the features of BGD and SGD and selects the average gradient value of n samples from a dataset to update the weight parameter. 0 shows the comparison of the three gradient descent algorithms. BGD is the most stable during running. However, because this algorithm traverses all samples for each update, it consumes a large quantity of computing resources. In SGD, randomly selecting samples for each update improves the calculation efficiency but also brings instability. The loss function may be unstable or even is reversely displaced when it decreases to the lowest point. MBGD is a balance between SGD and BGD, and is also a most commonly used gradient descent algorithm in machine learning at present.

03 Machine Learning (Textbook)

101

Machine Learning

Page 25

Figure 1-27 Efficiency comparison of gradient descent algorithms

1.4.2 Validation Set and Hyperparameter Search A training set is a set of samples used in model training. In the training process, gradient descent algorithms improve the prediction accuracy of the model on the samples in the training set as much as possible. As a result, the performance of the model on the training set is better than that on unknown datasets. To measure the generalization capability of a model, a part of the entire dataset is randomly selected as the test set before training, as shown in 0. Samples in the test set do not participate in training, and therefore are unknown to the model. It can be approximately considered that the performance of the model on the test set is the performance of the model on unknown samples.

Figure 1-28 Training set, validation set, and test set The goal of hyperparameter optimization is to improve the generalization capability of models. The most intuitive idea is to try different hyperparameter values, evaluate the performance of these models on the test set, and select the model with the strongest generalization capability. The problem with this idea is that the test set cannot participate in model training in any form, even in hyperparameter searches. Therefore, we need to randomly select some samples from the training set to form a validation set. Samples in the validation set do not participate in training either, and are used only to verify the effect of hyperparameters. Generally, a model needs to be repeatedly optimized on the training set and validation set to finally determine the parameters and

03 Machine Learning (Textbook)

102

Machine Learning

Page 26

hyperparameters, and to evaluate the performance of the model on the test set. Common methods used to search for model hyperparameters include grid search, random search, heuristic intelligent search, and Bayesian search.

Figure 1-29 Grid search and random search Grid search attempts to exhaustively search all possible hyperparameter combinations to form a hyperparameter value grid, as shown in the left part in 0. In practice, the scope and step size of the grid often need to be manually specified. Grid search works well when the number of hyperparameters is relatively small, so it is feasible in general machine learning algorithms. However, in a case such as a neural network, grid search is too expensive and time-consuming, and therefore is generally not used. When the hyperparameter search space is large, the effect of random search is better than that of grid search. See the right part in 0. In random search, each setting is sampled from the distribution of possible parameter values, in an attempt to find the best subset of hyperparameters. Search is performed within a coarse range, which then will be narrowed based on where the best result appears. In practice, some hyperparameters are more important than others. In this case, the important hyperparameters will directly affect the search deviation, and other hyperparameters may not be optimized well.

1.4.3 Cross Validation There are two main problems in the preceding methods of dividing validation sets: The randomness of sample division is very large, and the validation results are not persuasive; the number of samples that can be used for model training is further reduced. To solve the problems, you can divide the training set into k groups for k-fold cross validation. During k-fold cross validation, k rounds of training and validation are performed: A group of data is used as the validation set in turn, and the remaining k-1 groups of data are used as the training set. In this way, you will obtain k models and their classification

03 Machine Learning (Textbook)

103

Machine Learning

Page 27

accuracies on validation sets. The average value of the k classification accuracies can be used as the performance indicator of the model generalization capability. The k-fold cross validation can prevent the randomness of validation set division, and the validation result is more persuasive. However, k-fold cross validation requires the training of k models. If the dataset is large, the training time is long. Therefore, k-fold cross validation is generally applicable to small datasets. The value of k in k-fold cross validation is also a hyperparameter, which needs to be determined through experiments. In an extreme case, the value of k is the same as the number of samples in the training set. This practice is called leave-one-out cross validation, in which a training sample is left as the validation set during each training. The training effect of leave-one-out cross validation is better, because almost all training samples participate in training. However, leave-one-out cross validation will last for a longer time, so it only applies to very small datasets.

1.5 Common Machine Learning Algorithms 1.5.1 Overview As shown in Figure 1-30, there are many common algorithms for machine learning. This section briefly describes the principles and basic ideas of these algorithms. For details, refer to related books.

03 Machine Learning (Textbook)

104

Machine Learning

Page 28

Figure 1-30 Common machine learning algorithms

1.5.2 Linear Regression Linear regression, a type of supervised learning, is a statistical analysis method for determining the quantitative relationship between two or more variables through regression analysis in mathematical statistics. As shown in 0, the model function of linear regression is a hyperplane. ℎ(𝑥) = 𝑤 𝑇 𝑥 + 𝑏 In the formula, w is a weight parameter, b is an offset, and x is a sample.

Figure 1-31 Linear regression The relationship between the value predicted by the model and the actual value is as follows: 𝑦 = ℎ(𝑥) + 𝜀 In the formula, y indicates the actual value, and 𝜀 indicates an error. The error is affected by many factors. According to the central limit theorem, the error follows normal distribution. 𝜀~𝑁(0, 𝜎 2 ) The probability distribution of actual values can be obtained. 𝑦~𝑁(ℎ(𝑥), 𝜎 2 ) According to the maximum likelihood estimation, the target of model optimization is as follows: 𝑚

𝑚

𝑎𝑟𝑔𝑚𝑎𝑥 ∏ 𝑃(𝑌 = 𝑦𝑖 |𝑋 = 𝑥𝑖 ) = 𝑎𝑟𝑔𝑚𝑎𝑥 ∏ ℎ

𝑖=1



𝑖=1

1 √2𝜋𝜎

𝑒𝑥𝑝 (−

(ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 ) 2𝜎 2

In the formula, argmax indicates that a maximum value point is to be obtained, that is, h, −1

which maximizes the value of the target function. In the target function, (√2𝜋𝜎) is a constant irrelevant to h. Multiplying or dividing the target function by a constant does not change the position of the maximum or minimum value point. Therefore, the optimization target of the model can be expressed as follows:

03 Machine Learning (Textbook)

105

Machine Learning 𝑚

𝑎𝑟𝑔𝑚𝑎𝑥 ∏ 𝑒𝑥𝑝 (− ℎ

𝑖=1

Page 29

(ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 ) 2𝜎 2

Because the logarithmic function is monotonic, setting the target function to ln does not affect the maximum and minimum value points. 𝑚

𝑚

𝑎𝑟𝑔𝑚𝑎𝑥 𝑙𝑛 (∏ 𝑒𝑥𝑝 (− ℎ

𝑖=1

(ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 (ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 )) = 𝑎𝑟𝑔𝑚𝑎𝑥 ∑ − 2 2𝜎 2𝜎 2 ℎ 𝑖=1

If the target function is set to a negative value, the original maximum value point is changed to the minimum value point. In addition, we can multiply the target function by a constant 𝜎 2 /𝑚 to convert the optimization target of the model into: 𝑚

𝑎𝑟𝑔𝑚𝑖𝑛 ℎ

1 ∑(ℎ(𝑥𝑖 ) − 𝑦𝑖 )2 2𝑚 𝑖=1

Obviously, the loss function is: 𝑚

1 𝐽(𝑤) = ∑(ℎ( 𝑥𝑖 ) − 𝑦𝑖 )2 2𝑚 𝑖=1

We want the predicted value approaches the actual value as far as possible, that is, to minimize the loss value. We can use a gradient descent algorithm to calculate the weight parameter w when the loss function reaches the minimum, thereby complete model building.

Figure 1-32 Comparison between linear regression and polynomial regression Polynomial regression is an extension of linear regression. Generally, the complexity of a dataset exceeds the possibility of fitting performed by using a straight line. That is,

03 Machine Learning (Textbook)

106

Machine Learning

Page 30

obvious underfitting occurs if the original linear regression model is used. The solution is to use polynomial regression, as shown in Figure 1-32. ℎ(𝑥) = 𝑤1 𝑥 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝑛 𝑥 𝑛 + 𝑏 In the formula, n indicates the number of polynomial regression dimensions. Because the polynomial regression dimension is a hyperparameter, overfitting may occur if the dimension is selected unexpectedly. Applying regularization helps reduce overfitting. The most common regularization method is to add a square sum loss to the target function. 𝑚

1 2 𝐽(𝑤) = ∑(ℎ( 𝑥𝑖 ) − 𝑦𝑖 )2 + 𝜆‖𝑤‖2 2𝑚 𝑖=1

In the formula, ‖⋅‖2 indicates an L2 regular term. The linear regression model using this loss function is also known as a Ridge regression model. Similarly, a linear regression model with absolute loss is called a Lasso regression model. 𝑚

1 𝐽(𝑤) = ∑(ℎ( 𝑥𝑖 ) − 𝑦𝑖 )2 + 𝜆 ∑‖𝑤‖1 2𝑚 𝑖=1

In the formula, ‖⋅‖1 indicates an L1 regular term.

1.5.3 Logistic Regression The logistic regression model is used to solve classification problems. The model is defined as follows: ℎ(𝑥) = 𝑃(𝑌 = 1|𝑋) = 𝑔(𝑤 𝑇 𝑥 + 𝑏) In the formula, g represents a sigmoid function, w represents a weight, and b represents the bias. 𝑤 𝑇 𝑥 + 𝑏 is a linear function of x. Therefore, same as linear regression, logistic regression is also a generalized linear model. The sigmoid function is defined as follows: 𝑔(𝑥) =

1 1 + 𝑒𝑥𝑝{ − 𝑥}

0 shows a sigmoid function.

03 Machine Learning (Textbook)

107

Machine Learning

Page 31

Figure 1-33 Sigmoid function You can obtain a classification result corresponding to x by comparing the value relationship between P(Y=1|X) and the threshold t. The threshold t is a hyperparameter of the model, and can be any value. It is not difficult to see that when the threshold is large, the model tends to determine the sample as a negative example, so the precision rate is higher. When the threshold is small, the model tends to determine the sample as a positive example, so the recall rate is higher. Generally, you can use 0.5 as the threshold. According to the maximum likelihood estimation, when the sample is a positive one, we hope that P(Y=1|X) is larger. When the sample is a negative one, we hope that P(Y=0|X) is larger. That is to say, for any sample, we want the value of the following formula to be as large as possible. 𝑃 = 𝑃(𝑌 = 1|𝑋)𝑦 𝑃(𝑌 = 0|𝑋)1−𝑦 Replace P(Y=1|X) and P(Y=0|X) with h(x), we can obtain: 𝑃 = ℎ(𝑥)𝑦 ⋅ (1 − ℎ(𝑥))1−𝑦 Therefore, the target of model optimization is as follows: 𝑚

𝑚

𝑎𝑟𝑔𝑚𝑎𝑥 ∏ 𝑃𝑖 = 𝑎𝑟𝑔𝑚𝑎𝑥 ∏ ℎ(𝑥)𝑦 (1 − ℎ(𝑥))1−𝑦 ℎ



𝑖=1

𝑖=1

Similar to the derivation process of linear regression, the logarithm of the target function can be taken without changing the position of the maximum value point. Therefore, the optimization target of the model is equivalent to: 𝑚

𝑎𝑟𝑔𝑚𝑎𝑥 ∑(𝑦 𝑙𝑛 ℎ (𝑥) + (1 − 𝑦) 𝑙𝑛(1 − ℎ(𝑥))) ℎ

𝑖=1

Multiplying the target function by the constant -1/m will cause the original maximum value point to become the minimum value point, that is: 𝑚

−1 𝑎𝑟𝑔𝑚𝑖𝑛 ∑(𝑦 𝑙𝑛 ℎ (𝑥) + (1 − 𝑦) 𝑙𝑛(1 − ℎ(𝑥))) 𝑚 ℎ 𝑖=1

The loss function of logistic regression can be obtained. 1 𝐽(𝑤) = − ∑(𝑦 𝑙𝑛 ℎ ( 𝑥) + (1 − 𝑦) 𝑙𝑛( 1 − ℎ(𝑥))) 𝑚 In the formula, w indicates the weight parameter, m indicates the number of samples, x indicates the sample, and y indicates the actual value. You can also obtain the values of all the weight parameters w by using a gradient descent algorithm. Softmax regression is a generalization of logistic regression that we can use for kcategory classification. The Softmax function is used to map a k-dimensional vector of arbitrary actual values to another k-dimensional vector of actual values, to represent probability distribution of a category to which a sample belongs. The Softmax regression probability density function is as follows: 𝑃(𝑌 = 𝑐|𝑥) =

𝑒𝑥𝑝{ 𝑤𝑐𝑇 𝑥 + 𝑏} 𝑘∑𝑇

∑𝑙=1𝑙 𝑒𝑥𝑝

03 Machine Learning (Textbook)

108

Machine Learning

Page 32

As shown in 0, Softmax assigns a probability value to each category in a multicategory problem. The sum of all the probabilities must be 1. Of these categories, the probability of the Apple category is 0.68, so the predicted value of the sample should be Apple.

Figure 1-34 Softmax function example

1.5.4 Decision Tree A decision tree is a binary or non-binary tree classifier, as shown in 0. Each non-leaf node represents a test on a feature attribute. Each branch represents the output of a feature attribute in a certain value range, and each leaf node stores a category. To use the decision tree, start from the root node, test the feature attributes of the items to be classified, select the output branches, and use the category stored on the leaf node as the final result.

03 Machine Learning (Textbook)

109

Machine Learning

Page 33

Figure 1-35 Decision tree example Tree construction is the most important part of the decision tree model. To construct a decision tree, we need to select attributes and determine the topology structure between feature attributes. The key step of constructing a decision tree is to divide data of all feature attributes, compare the result sets in terms of purity, and select the attribute with the highest purity as the data point for dataset division. The learning algorithm of a decision tree is a decision tree construction algorithm. The common algorithms include ID3, C4.5, and CART. The differences between these algorithms lie in quantitative indicators of purity, such as information entropy and Gini coefficient. 𝐾

𝐻(𝑋) = − ∑ 𝑝𝑘 𝑙𝑜𝑔2 𝑝𝑘 𝑘=1 𝐾

𝐺𝑖𝑛𝑖 = 1 − ∑ 𝑝𝑘2 𝑘=1

In the formula, 𝑝𝑘 indicates the probability that the sample belongs to category k, and K indicates the total number of categories. The larger the difference between the purity before and after the segmentation, the better the model accuracy is to be improved by judging a certain feature. Therefore, the feature should be added to the decision tree model. Generally, the decision tree construction process can be divided into the following three phases:

03 Machine Learning (Textbook)

110

Machine Learning

Page 34

(1) Feature selection: Select a feature from the features of the training data as the split standard of the current node. (Different standards generate different decision tree algorithms.) (2) Decision tree generation: Generate subnodes from top down based on the selected feature and stop until the dataset can no longer be split. (3) Pruning: Reduce the tree size and optimize its node structure to restraint overfitting of the model. Pruning can be classified into pre-pruning and post-pruning. 0 shows an example of classification using a decision tree model. The classification result is affected by the Refund, Marital Status, and Taxable Income attributes. From this example, we can see that the decision tree model can handle not only the case where the attribute has two values, but also the case where the attribute has multiple values or even consecutive values. In addition, a decision tree model is interpretable. We can intuitively analyze the importance relationship between attributes based on the structure diagram on the right in 0.

Figure 1-36 Building a decision tree

1.5.5 SVMs An SVM is a linear classifier defined in the eigenspace with the largest interval. By means of kernel tricks, SVMs can be made into nonlinear classifiers in essence. The SVM learning algorithm is the optimal solution to convex quadratic linear programming. In general, the main ideas of SVM include two points: (1) Based on the structural risk minimization principle, an optimal hyperplane is constructed in the eigenspace, so that the learner is optimized globally, and the expectation of the whole sample space satisfies an upper boundary with a certain probability. (2) In the case of linear inseparability, non-linear mapping algorithms are used to convert the linearly inseparable samples of low-dimensional input space into high-dimensional eigenspace. In this way, samples are linearly separable. Then the linear algorithm can be used to analyze the non-linear features of samples. Straight lines are used to divide data into different categories. Actually, we can use multiple straight lines to divide data, as shown in 0. The core idea of SVM is to find a line that meets the preceding conditions and keep the point closest to the line away from the

03 Machine Learning (Textbook)

111

Machine Learning

Page 35

line as far as possible. This gives the model a strong generalization capability. These points closest to the straight line are called support vectors.

Figure 1-37 Performance of a linear classifier The use of linear SVM can work well on linear separable datasets, but we cannot segment nonlinear datasets with straight lines. In this case, you need to use kernel functions to construct nonlinear SVMs. Kernel functions allow algorithms to fit the largest hyperplane in a transformed high-dimensional feature space, as shown in 0. Common kernel functions include the linear kernel function, polynomial kernel function, Sigmoid kernel function, and Gaussian kernel function. The Gaussian kernel function can map samples to infinite dimension space, so the effect is also good. It is one of the most commonly used kernel functions.

Figure 1-38 Kernel method

1.5.6 KNN The K-nearest neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. KNN is a non-parametric method, which usually works well in datasets with irregular decision boundaries. According to this method, if the majority of K samples most similar to one sample (nearest neighbors in the eigenspace) belong to a specific category, this sample also belongs to this category.

03 Machine Learning (Textbook)

112

Machine Learning

Page 36

Figure 1-39 KNN example As the prediction result is determined based on the number and weights of neighbors in the training set, the KNN algorithm has a simple logic. However, like k-fold cross validation, K in KNN is also a hyperparameter. This means that it is difficult to properly select the K value. As shown in 0, when the value of K is 3, the prediction result at the question mark is red triangles. When the value of K is 5, the prediction result at the question mark is blue squares. 0 shows the decision boundaries corresponding to different K values. It can be found that the boundary becomes smoother as the value of K increases. If the K value continues to increase to exceed the number of training samples, the entire plane will eventually become all blue. Generally, a larger K value reduces the impact of noise on classification, but obfuscates the boundary between categories. A larger K value means a higher probability of underfitting because the decision boundary is too rough. A smaller K value means a higher probability of overfitting because the decision boundary is too refined.

03 Machine Learning (Textbook)

113

Machine Learning

Page 37

Figure 1-40 Influence of the K value on the decision boundary The KNN algorithm can be used not only for classification prediction, but also for regression prediction. The majority voting method is generally used for classification prediction, and the average value method is generally used for regression prediction. Although these methods seem to be related only to the K samples of the nearest neighbor, KNN requires a very large amount of computation. This is because KNN needs to traverse all samples to determine which K samples are adjacent to the sample to be tested.

1.5.7 Naive Bayes Naive Bayes is a simple multicategory algorithm based on the Bayes theorem and assumes that features are independent of each other. For a given sample feature X, the probability that a sample belongs to a category c is as follows: 𝑃(𝑋|𝐶 = 𝑐)𝑃(𝐶 = 𝑐) 𝑃(𝐶 = 𝑐|𝑋) = 𝑃(𝑋) In the formula, P(C=c|X) indicates the posterior probability, P(C=c) indicates the prior probability of the target, and P(X) indicates the prior probability of the feature. Generally, we do not consider P(X) because it can be regarded as a fixed value during classification, that is: 𝑃(𝐶 = 𝑐|𝑋) ∝ 𝑃(𝑋|𝐶 = 𝑐)𝑃(𝐶 = 𝑐)

P(C=c) is irrelevant to X and needs to be determined before model training. Generally, P(C=c) is calculated based on the ratio of samples whose dataset type is c. Therefore, the

03 Machine Learning (Textbook)

114

Machine Learning

Page 38

core of classification is to calculate P(X|C=c). Assume that feature X consists of multiple elements. 𝑋 = (𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 ) It can be easily calculated as follows: 𝑛

∏ 𝑃(𝑋𝑖 |𝐶 = 𝑐) 𝑖=1

By combining the independent hypothesis of features, we can prove that: 𝑛

𝑃(𝑋|𝐶 = 𝑐) = ∏ 𝑃(𝑋𝑖 |𝐶 = 𝑐) 𝑖=1

The content of the independent hypothesis of features is that the distribution of each attribute value is independent of the distribution of other attribute values when a given sample classification is used as a condition. Naive Bayes is "naive" precisely because of the use of independent hypothesis of features in its model. Making this hypothesis effectively simplifies computation, and makes the Naive Bayes classifier have higher accuracy and training speed on large databases. For example, we want to determine a person's gender C based on the height 𝑋1 and weight 𝑋2. Suppose that the probabilities of men with a height of 180 centimeters and 150 centimeters are 80% and 20% respectively, and the probabilities of men with a weight of 80 kilograms and 50 kilograms are 70% and 30% respectively. According to the Naive Bayesian model, the probability that a person with a height of 180 centimeters and a weight of 50 kilograms is male is 0.8 × 0.3 = 0.24, while the probability that a person with a height of 150 centimeters and a weight of 80 kilograms is male is only 0.7 × 0.2 = 0.14. It can be assumed that the two features of height and weight independently contribute to the probability that a person is male. The performance of the Naive Bayesian model usually depends on the degree to which the independent hypothesis of features is satisfied. In the preceding example, the two features of height and weight are not completely independent. This correlation inevitably affects the accuracy of the model. However, as long as the correlation is not high, we can continue to use the Naive Bayesian model. In actual applications, different features are seldom completely independent of each other.

1.5.8 Ensemble Learning Ensemble learning is a machine learning paradigm, in which multiple learners are trained and combined to solve the same problem, as shown in 0. When multiple learners are used, the integrated generalization capability can be much stronger than that of a single learner. If you ask a complex question to thousands of person at random and then summarize their answers, the summarized answer is better than an expert's answer in most cases. This is wisdom of the masses.

03 Machine Learning (Textbook)

115

Machine Learning

Page 39

Figure 1-41 Ensemble learning Ensemble learning can be divided into Bagging and Boosting. The Bagging method independently builds several basic learners and then averages their predictions. Typical models include the random forest. On average, a composite learner is usually better than a single-base learner because of a smaller variance. The Boosting method constructs basic learners in sequence to gradually reduce the bias of a composite learner. Typical models include Adaboost, GBDT, and XGboost. In general, the Bagging method can reduce the variance, thus restraining overfitting. The Boosting method focuses on reducing the bias, thereby improving the capacity of the model, but may cause overfitting.

Figure 1-42 Random forest algorithm Random forest is a combination of the Bagging method and CART decision tree. 0 shows the overall process of the random forest algorithm. The random forest algorithm can be used for classification and regression problems. The basic principle is to build multiple decision trees and merge them together to make prediction more accurate and stable. In

03 Machine Learning (Textbook)

116

Machine Learning

Page 40

the training process of decision trees, sampling is performed on both the sample level and feature level. At the sample level, the sample subsets used for decision tree training are determined by Bootstrap sampling (repeatable sampling). At the feature level, some features are randomly selected to calculate the information gain before each node of a decision tree is split. By synthesizing the prediction results of multiple decision trees, the random forest model can reduce the variance of a single decision tree model, but cannot effectively correct the bias. Therefore, the random forest model requires that each decision tree cannot be underfitting, even if this requirement may lead to overfitting of some decision trees. In addition, each decision tree model in the random forest is independent of each other. Therefore, the training and prediction processes can be executed concurrently.

Figure 1-43 GBDT algorithm Gradient boosting decision tree (GBDT) is one of the Boosting algorithms. The prediction value of the model is the sum of the results of all decision trees. The essence of GBDT is to continuously use new decision trees and learn the residuals of all previous decision trees, that is, the errors between predicted values and actual values. As shown in 0, the prediction result of a sample in the first decision tree is 20 years old, but the actual age of the sample is 30 years old. The difference between the predicted value and the actual value is 10 years. If we use another decision tree to predict this difference, we can improve the 20-year-old prediction result to bring it closer to 30. Based on this idea, we introduce a second decision tree to learn the error of the first decision tree, and so on. Finally, the actual value of 30 year old is obtained by adding the prediction results of the three learners. GBDT improves the accuracy by continuously correcting the bias of decision trees. Therefore, some underfitting is allowed for decision trees. However, GBDT cannot correct the variance. Therefore, overfitting is not allowed for decision trees. This is one of the biggest differences between the Boosting and Bagging algorithms. In addition, the training data of each decision tree in GBDT depends on the output of the previous decision tree. Therefore, concurrent model training is not supported.

03 Machine Learning (Textbook)

117

Machine Learning

Page 41

1.5.9 Clustering Algorithm K-means clustering aims to partition n observations into K clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. As shown in 0, the obtained clusters meets the following condition: The similarity of objects is high in the same cluster but low in different clusters.

Figure 1-44 Clustering algorithm Compared with the K-means algorithm, the hierarchical clustering algorithm not only outputs the clustering result, but also outputs the tree-like relationship between samples. As shown in 0, the hierarchical clustering algorithm divides a dataset at different layers and forms a tree-like clustering structure. The dataset division can use a "bottom-up" aggregation policy, or a "top-down" splitting policy. The hierarchy of clustering is represented in a tree graph. The root is the cluster of all samples, and the leaves are the cluster of only a sample.

Figure 1-45 Hierarchical clustering

03 Machine Learning (Textbook)

118

Machine Learning

Page 42

1.6 Case Study This section reviews the overall process of machine learning by using a specific case. Assume that a dataset contains the house areas and prices of 21,613 housing units in a city, as shown in 0. Based on the data, we can train a model to predict the prices of other housing units in the city.

Figure 1-46 House price dataset By analyzing the data, we can find that the input (house area) and output (price) in the data are continuous values. Therefore, we can use the regression model of supervised learning. The project aims to build a model function h(x) that infinitely approaches the function that expresses the true distribution of the dataset. 0 shows the scatter chart of the data and a possible model function.

03 Machine Learning (Textbook)

119

Machine Learning

Page 43

Figure 1-47 Model hypothesis Linear regression aims to find a straight line that best fits the dataset, that is, determine the parameter w in the model. To find the optimal parameter, construct a loss function and find the parameter values when the loss function becomes the minimum. 1 𝐽(𝑤) = ∑(ℎ( 𝑥) − 𝑦)2 2𝑚 In the preceding formula, m indicates the number of samples, h(x) indicates the predicted value, and y indicates the actual value. Intuitively, the loss function represents the sum of squares of errors between all samples and the model function, as shown in 0. In normal cases, when the loss is minimized, all samples should be evenly distributed on both sides of the fitting straight line. In this case, the fitting straight line is the required model function.

03 Machine Learning (Textbook)

120

Machine Learning

Page 44

Figure 1-48 Geometric meaning of errors As described above, a gradient descent algorithm finds the minimum value of a function by using the iteration method. As shown in Figure 1-49, a gradient descent algorithm randomly selects an initial point on the loss function, and then finds the global minimum value based on the negative gradient direction. This parameter value is the optimal parameter value.

03 Machine Learning (Textbook)

121

Machine Learning

Page 45

Figure 1-49 Loss function surface Point A in Figure 1-49 indicates the position of parameter w after random initialization. Point B indicates the global minimum value of the loss function, that is, the final parameter value. The A-B connection line indicates the formed based on descents in the negative gradient direction. The value of parameter w changes in each iteration. As a result, the regression line changes continuously. 0 is an example of gradient descent iteration. As observed, red points on the loss function surface gradually approach the lowest point, and fitting of the red line of linear regression with data becomes better. Finally, we can obtain the optimal model function h(x) = 280.62x − 43581.

Figure 1-50 Visualized gradient descent process After model training is complete, we need to use the test set for testing to ensure that the model has a strong generalization capability. If overfitting occurs during testing, add a regular term to the loss function and adjust hyperparameters. If underfitting occurs during testing, use a more complex regression model, such as GBDT. Afterwards, we need to retrain the model and test it again using the test set until the generalization capability of the model meets expectations. Note that data cleansing and feature engineering cannot be ignored because real data is used in the project.

1.7 Summary This chapter first describes the definition and classification of machine learning, as well as problems machine learning solves. Then, it introduces key knowledge points of machine learning, including the overall procedure (data collection, data cleansing, feature selection, model training, model evaluation, and model deployment), common algorithms (including linear regression, logistic regression, decision tree, SVM, Naive Bayes, KNN, ensemble learning, and K-means), gradient descent algorithms, and hyperparameters. Finally, a complete machine learning process is presented by the case of using linear regression to predict house prices.

03 Machine Learning (Textbook)

122

Machine Learning

Page 46

1.8 Quiz 1.

Machine learning is the core technology of AI. Please define machine learning.

2.

The generalization error of a model can be divided into variance, bias, and irreducible error. What is the difference between variance and bias? What are the characteristics of variance and bias of an overfitting model?

3.

Please calculate the value of 𝐹1 for the confusion matrix shown in 0.

4.

In machine learning, a dataset is generally divided into the training set, validation set, and test set. What is the difference between the validation set and test set? Why do we need to introduce the validation set?

5.

Linear regression models use linear functions to fit data. How does a linear regression model process non-linear data?

6.

Many classification models can only deal with binary-classification problems. Try to provide a method, using SVM as an example, to deal with multiclass classification problems.

7.

How does the Gaussian kernel function in the SVM map a feature to an infinite dimensional space?

8.

Is gradient descent the only way to train a model? What are the limitations of this algorithm?

03 Machine Learning (Textbook)

123

Huawei AI Academy Training Materials

Deep Learning

Huawei Technologies Co., Ltd.

04 Deep Learning (Textbook)

124

Copyright © HiSilicon (Shanghai) Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of HiSilicon (Shanghai) Technologies Co., Ltd.

Trademarks and Permissions ,

, and other HiSilicon icons are trademarks of HiSilicon Technologies Co., Ltd.

All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services and features are stipulated by the contract made between HiSilicon and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

HiSilicon (Shanghai) Technologies Co., Ltd. Address:

New R&D Center, 49 Wuhe Road, Bantian, Longgang District, Shenzhen 518129 P. R. China

Website:

http://www.hisilicon.com/en/

Email:

[email protected]

04 Deep Learning (Textbook)

125

Deep Learning

Page 2

Contents 1 Deep Learning..................................................................................................................................... 4 1.1 Deep learning ............................................................................................................................................................................ 4 1.1.1 Overview .................................................................................................................................................................................. 4 1.1.2 Deep Neural Network ......................................................................................................................................................... 5 1.1.3 Development History of Deep Learning ....................................................................................................................... 6 1.1.4 Perception Algorithm .......................................................................................................................................................... 7 1.2 Training Rules ..........................................................................................................................................................................10 1.2.1 Loss Function ........................................................................................................................................................................10 1.2.2 Gradient Descent Method ...............................................................................................................................................11 1.2.3 BP Algorithm ........................................................................................................................................................................13 1.3 Activation Function ................................................................................................................................................................15 1.4 Regularization ..........................................................................................................................................................................17 1.4.1 Parameter Penalty ..............................................................................................................................................................17 1.4.2 Dataset Expansion ..............................................................................................................................................................19 1.4.3 Dropout ..................................................................................................................................................................................20 1.4.4 Early Stopping of Training ...............................................................................................................................................20 1.5 Optimizers .................................................................................................................................................................................21 1.5.1 Momentum Optimizer ......................................................................................................................................................21 1.5.2 AdaGrad Optimizer ............................................................................................................................................................22 1.5.3 RMSProp Optimizer............................................................................................................................................................23 1.5.4 Adam Optimizer ..................................................................................................................................................................23 1.6 Types of Neural Networks ..................................................................................................................................................24 1.6.1 CNN..........................................................................................................................................................................................24 1.6.2 RNN..........................................................................................................................................................................................28 1.6.3 GAN..........................................................................................................................................................................................31 1.7 Common Issues .......................................................................................................................................................................33 1.7.1 Data Imbalance ...................................................................................................................................................................33 1.7.2 Gradient Vanishing and Gradient Explosion .............................................................................................................34 1.7.3 Overfitting .............................................................................................................................................................................34 1.8 Summary ...................................................................................................................................................................................35 1.9 Quiz .............................................................................................................................................................................................35

2 Deep Learning Development Frameworks ............................................................................... 36 2.1 Deep Learning Development Frameworks ...................................................................................................................36 2.1.1 Introduction to PyTorch ....................................................................................................................................................36

04 Deep Learning (Textbook)

126

Deep Learning

Page 3

2.1.2 Introduction to MindSpore ..............................................................................................................................................37 2.1.3 Introduction to TensorFlow .............................................................................................................................................38 2.2 TensorFlow 2.0 Basics ...........................................................................................................................................................39 2.2.1 Introduction ..........................................................................................................................................................................39 2.2.2 Tensors ....................................................................................................................................................................................40 2.2.3 Eager Execution Mode ......................................................................................................................................................40 2.2.4 AutoGraph .............................................................................................................................................................................40 2.3 TensorFlow 2.0 Modules ......................................................................................................................................................40 2.3.1 Common Modules ..............................................................................................................................................................40 2.3.2 Keras API ................................................................................................................................................................................41 2.4 Basic Development Steps of TensorFlow 2.0 ...............................................................................................................42 2.4.1 Environment Setup .............................................................................................................................................................42 2.4.2 Development Process ........................................................................................................................................................43 2.5 Summary ...................................................................................................................................................................................46 2.6 Quiz .............................................................................................................................................................................................46

04 Deep Learning (Textbook)

127

Deep Learning Development Frameworks

1

Page 4

Deep Learning

Deep learning is a machine learning model based on neural networks and has great advantages in fields such as computer vision, speech recognition, and natural language processing. This chapter introduces the basic knowledge of deep learning, including the development history of deep learning, components of deep learning neural networks, types of deep learning neural networks, and common problems in deep learning projects.

1.1 Deep learning 1.1.1 Overview In conventional machine learning, features are manually selected. More features indicate more information transferred to a model, and a stronger expression capability of the model. However, as features increase, the algorithm complexity grows, and the model search space also rises accordingly. The training data will appear very sparse in the feature space, which affects the similarity judgment. This phenomenon is called dimension explosion. More importantly, a feature not beneficial to the task may interfere with the learning effect. Limited by the number of features, conventional machine learning algorithms are suitable for training small volumes of data. When the data volume increases to a certain extent, it is difficult to improve the performance by increasing the data volume. Therefore, conventional machine learning has a relatively low requirement for computer hardware, and supports a limited computing amount. Generally, no GPU is required for parallel computing.

04 Deep Learning (Textbook)

128

Deep Learning Development Frameworks

Page 5

Figure 1-1 General process of machine learning Figure 1-1 shows the general process of conventional machine learning. In this process, features have strong interpretability because they are manually selected. However, more features do not mean better learning effect. Proper feature selection is the key to identification success. The number of required features can be determined by the problem. To avoid inherent biases that may be introduced by manual feature selection, deep learning seeks an algorithm that can automatically extract features. Although this weakens the interpretability of features, it improves the adaptability of the model to different problems. In addition, deep learning uses an end-to-end learning model and high-dimensional weight parameters to obtain higher performance than conventional methods based on massive training data. Massive data poses higher requirements on hardware: The processing speed of a large number of matrix operations on the CPU is too slow, and a GPU is needed for parallel acceleration.

1.1.2 Deep Neural Network Generally, deep learning refers to a deep neural network, that is, a multi-layer neural network. It is a model constructed by simulating the neural network of human beings. As shown in Figure 1-2, a deep neural network is a stack of sensors, which simulate human neurons. In the middle and right parts of Figure 1-2, each circle represents one neuron. The following description will illustrate the similarities between this design and the neurons of human brains. In the design and application of artificial neural networks, the following factors need to be considered: neuron functions, connection modes among neurons, and network learning (training).

04 Deep Learning (Textbook)

129

Deep Learning Development Frameworks

Page 6

Figure 1-2 Human brain neurons and artificial neural networks So what exactly is a neural network? Currently, there are different definitions of neural networks. According to Hecht Nielsen, an American neural network scientist, a neural network is a computer system formed by multiple highly simple processing units connected to each other in a specific manner. The system processes information by dynamically responding to external input information based on a status of the system. Based on the source, characteristics, and explanations of the neural network, the neural network can be simply expressed as an information processing system designed to imitate the human brain structure and functions. Artificial neural networks reflect some basic features of human brain functions, such as parallel information processing, learning, association, pattern classification, and memorization. A neural network is a network formed by interconnected artificial neurons, is abstraction and simplification of a human brain in terms of microstructure and function, and is an important way of simulating human smart.

1.1.3 Development History of Deep Learning The development history of deep learning is also the development history of neural networks. Since the 1950s, with the continuous development of computer hardware technologies, neural networks have developed from a single layer to multiple layers, and finally become the current well-known deep neural networks. Generally, the development of neural networks can be divided into three phases, as shown in Figure 1-3.

04 Deep Learning (Textbook)

130

Deep Learning Development Frameworks

Page 7

Figure 1-3 Development history of machine learning In 1958, Rosenblatt invented the Perceptron algorithm, marking the beginning of the germination phase of neural networks. However, machine learning in this period had not been separated from other research directions of artificial smart (AI). Therefore, the Perceptron algorithm had not been greatly developed. In 1969, Minsky, an American AI pioneer, questioned that perceptrons could only handle linear classification problems and could not handle even the simplest exclusive OR (XOR) problem. These doubts directly sentenced the Perceptron algorithm to death, and also brought a "cold winter" to deep learning for nearly 20 years. It wasn't until 1986 that Hinton's Multilayer Perceptron (MLP) changed the situation. Hinton proposed to use the sigmoid function to perform nonlinear mapping on the output of perceptrons. This effectively solves the problem of nonlinear classification and learning. In addition, Hinton invented the backpropagation (BP) algorithm suitable for MLP training. This algorithm and its derivatives are still used for deep neural network training nowadays. In 1989, Robert Hecht-Nielsen proved the universal approximation theorem. According to the theorem, any continuous function f in a closed interval can be approximated by using a BP network with one hidden layer. In short, neural networks have the capability of fitting any continuous function. Until 1998, a variety of neural networks emerged, including the well-known convolutional neural network (CNN) and recurrent neural network (RNN). However, as excessively deep neural network training may lead to gradient vanishing and gradient explosion, neural networks once again faded out. 2006 is a significant year of deep learning. In this year, Hinton proposed a solution to gradient vanishing in deep network training: a combination of unsupervised pre-training and supervised fine-tuning. In 2012, AlexNet proposed by Hinton's project team, won the top-class image recognition competition ImageNet Large Scale Visual Recognition Challenge over other methods, setting off the climax of deep learning. In 2016, the deep learning AI program AlphaGo developed by Google beat the Go world champion Lee Sedol who is a player of 9 dan rank, further promoting the popularity of deep learning.

1.1.4 Perception Algorithm The single-layer perceptron is the simplest neural network. As shown in Figure 1-4, the input vector 𝑋 = [𝑥0 , 𝑥1 , … , 𝑥𝑛 ]𝑇 and the weight 𝑊 = [𝑤0 , 𝑤1 , … , 𝑤𝑛 ]𝑇 are first used to calculate an inner product, which is denoted as net. 𝑥0 is generally fixed at 1, and 𝑤0 is referred to as an offset. For regression problems, net can be directly used as the output of perceptrons, while for classification problems, net can be used as the output only after being input into the activation function Sgn(net). The Sgn function is set to 1 in the region where x is greater than 0, and is set to –1 in other regions.

04 Deep Learning (Textbook)

131

Deep Learning Development Frameworks

Page 8

Figure 1-4 Perceptrons The perceptron shown in Figure 1-4 is equivalent to a classifier. It uses the highdimensional X vector as input and performs binary classification on input samples in high-dimensional space. Specifically, when 𝑊 𝑇 𝑋 > 0, if Sgn(net) is equal to 1, samples are classified into a positive class, or if Sgn(net) is equal to –1, samples are classified into a negative class. The boundary between the two classes is 𝑊 𝑇 𝑋 = 0, a hyperplane in high-dimensional space.

Figure 1-5 XOR problem A perceptron is essentially a linear model, which can handle only linear classification but not nonlinear data. As shown in Figure 1-5, the perceptron can easily find a straight line to classify AND and OR operations correctly, but it cannot handle XOR operations. In 1969, Minsky used such a simple example to prove the limitations of perceptrons.

04 Deep Learning (Textbook)

132

Deep Learning Development Frameworks

Page 9

Figure 1-6 MLP For a perceptron to process nonlinear data, the MLP (namely, a feedforward neural network, FNN) is invented, as shown in Figure 1-6. FNN is the simplest neural network, in which neurons (perceptrons) are arranged hierarchically. It is one of the most widely used and rapidly developed artificial neural networks. The three leftmost neurons in Figure 1-6 form the input layer of the entire network. The neurons at the input layer do not have a computing function, and are only used to represent the component values of the input vector. Nodes at other layers than the input layer represent neurons with the computing function, and are referred to as computing units. Each layer of neurons accepts only the output of the previous layer of neurons as input and provides output to the next layer. Neurons at the same layer are not interconnected, and inter-layer information can only be transmitted in one direction.

Figure 1-7 MLP for solving XOR problems Only a very simple MLP is needed to solve the XOR problem. The left part in Figure 1-7 shows the structure of an MLP. The solid line indicates that the weight is 1, the dashed

04 Deep Learning (Textbook)

133

Deep Learning Development Frameworks

Page 10

line indicates that the weight is –1, and the number in a circle indicates an offset. For example, for the point (0, 1): 𝑥1 = 0, 𝑥2 = 1 The output of the purple neuron is as follows: 𝑆𝑔𝑛( 𝑥1 + 𝑥2 − 1.5) = 𝑆𝑔𝑛( − 0.5) = −1 The coefficients of 𝑥1 and 𝑥2 are both 1 because the two lines on the left of the purple neuron are solid lines. The output of the yellow neuron is as follows: 𝑆𝑔𝑛( − 𝑥1 − 𝑥2 + 0.5) = 𝑆𝑔𝑛( − 0.5) = −1 The coefficients of 𝑥1 and 𝑥2 are both –1 because the two lines on the left of the yellow neuron are dashed lines. The output of the rightmost neuron is as follows: 𝑆𝑔𝑛( − 1 − 1 + 1) = 𝑆𝑔𝑛( − 1) = −1 In the preceding formula, both the numbers –1 in the left part are the outputs of the purple and yellow neurons, and the number +1 is the offset of the output neuron. You can verify that the outputs of the MLP for (0, 0), (1, 0), and (1, 1) are 1, –1, and 1, respectively, which are consistent with the results of the XOR operations. Actually, the purple and yellow neurons correspond to the purple and yellow lines in the right part of Figure 1-7, respectively, so that a linear classifier is used to classify nonlinear samples. As the number of hidden layers increases, the nonlinear classification capability of the neural network is gradually enhanced, as shown in Figure 1-8.

Figure 1-8 Neural network with multiple hidden layers

1.2 Training Rules The core of machine learning model training is the loss function, and deep learning is no exception. This section describes the rules for model training based on the loss function in deep learning, including the gradient descent method and BP algorithm.

1.2.1 Loss Function During training of a deep neural network, you first need to build a function to describe the target classification error, which is the loss function (error function). The loss function

04 Deep Learning (Textbook)

134

Deep Learning Development Frameworks

Page 11

reflects the error between the target output and the actual output of a perceptron. The most common error function is the mean squared error function. 1 𝐽(𝑤) = ∑ (𝑡𝑑 − 𝑜𝑑 )2 2𝑛 𝑥∈𝑋,𝑑∈𝐷

In the formula, w is the model parameter, X is the training sample set, n is the size of X, D is the collection of neurons at the output layer, t is the target output, and o is the actual output. Although w does not appear in the right part of the formula, the actual output o needs to be calculated based on the model. Therefore, the actual output o depends on the value of w. As described above, both t and o are constants once the training sample is given. The actual output of the loss function varies with w, so the independent variable of the error function is w. The mean square error loss function is characterized in that the square sum of errors is used as the main body, where an error refers to a difference between the target output t and the actual output o. In the formula, the coefficient 1/2 is difficult to understand. As described below, the existence of this coefficient allows for a more concise form of the derivative of the loss function. That is, the coefficient 1/2 is multiplied by the index 2, and the number 1 is obtained Cross entropy loss is another commonly used loss function. 1 𝐽(𝑤) = − ∑ (𝑡𝑑 𝑙𝑛 𝑜𝑑 + ( 1 − 𝑡𝑑 ) 𝑙𝑛( 1 − 𝑜𝑑 )) 𝑛 𝑥∈𝑋,𝑑∈𝐷

The meanings of the symbols are the same as those of the mean square error loss function. The cross entropy loss expresses the distance between two probability distributions. In general, the mean square error loss function is mainly used for regression problems, while the cross entropy loss function is more used for classification problems. The objective of the training model is to search for a weight vector that minimizes the loss function. However, the neural network model is highly complex, and there is no effective method to obtain an analytical solution in mathematics. Therefore, the gradient descent method is needed to calculate the minimum value of the loss function.

1.2.2 Gradient Descent Method The gradient of the multivariate function 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) at X is as follows: 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝑇 𝛻𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = [ , ,…, ] | 𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑛 𝑋 The direction of the gradient vector is the fastest growing direction of the function. As a result, the direction of the negative gradient vector −𝛻𝑓 is the fastest descent direction of the function. The gradient descent method enables the loss function to search along the negative gradient direction and update the parameters iteratively, finally minimizing the loss function. Each sample in the training sample set X is denoted as , where x is the input vector, t is the target output, o is the actual output, and 𝜂 is the learning rate. Figure 1-9 shows the pseudocode of the batch gradient descent (BGD) algorithm.

04 Deep Learning (Textbook)

135

Deep Learning Development Frameworks

Page 12

Figure 1-9 BGD method The BGD algorithm is a product of directly applying gradient descent to deep learning, and is actually uncommon. The main problem of this algorithm lies in that all training samples need to be calculated each time the weight is updated, and therefore the convergence speed is very slow. For this disadvantage, the stochastic gradient descent (SGD, also known as incremental gradient descent) method is used, which is a common gradient descent variant. Figure 1-10 shows the pseudocode of the SGD method.

Figure 1-10 SGD method The SGD algorithm selects one sample at a time to update the gradient. One of the advantages of such practice is that the dataset can be expanded during model training. This mode of training the model during data collection is called online learning. Compared with the BGD algorithm, the SGD algorithm increases the frequency of weight update, but moves to another extreme. Most training samples contain noises. The BGD method can reduce the impact of noise by averaging the gradients of multiple samples. However, the SGD method considers only a single sample each time the weight is

04 Deep Learning (Textbook)

136

Deep Learning Development Frameworks

Page 13

updated. As a result, when the extremum is approximated to, the gradient direction is oriented up and down near the extremum but difficult to converge to the extremum.

Figure 1-11 Mini-batch gradient descent method In practice, the most commonly used gradient descent algorithm is the mini-batch gradient descent (MBGD) algorithm, as shown in Figure 1-11. In view of the disadvantages of the foregoing two gradient descent algorithms, the MBGD algorithm uses a small batch of samples each time the weight is updated, so that both efficiency and stability of the gradient are considered. The batch size varies with the specific problem, and is generally 128.

1.2.3 BP Algorithm The gradient of the loss function needs to be calculated when the gradient descent algorithm is used. For conventional machine learning algorithms, such as linear regression and support vector machine (SVM), manual calculation of gradients is sometimes feasible. However, the neural network model function is complex, and the gradient of the loss function with respect to all parameters cannot be represented by using one formula. Therefore, Hinton proposes the BP algorithm, which effectively accelerates the training of neural networks by updating weight values layer by layer during the backpropagation process.

04 Deep Learning (Textbook)

137

Deep Learning Development Frameworks

Page 14

Figure 1-12 Backpropagation of errors As shown in Figure 1-12, the backpropagation direction is opposite to the forward propagation direction. For each sample in the training sample set X, an output provided by the model is denoted as o. Assume that the loss function is the mean square error loss function. 1 𝐽(𝑤) = ∑ (𝑡𝑑 − 𝑜𝑑 )2 2𝑛 𝑥∈𝑋,𝑑∈𝐷

Assume that there are L layers in the model (the input layer is excluded), and the parameter of the lth layer is denoted as 𝑤𝑙 . It is considered that J(w) does not obtain the minimum value during iteration because there is a deviation between w and the optimal parameter value for each layer. That is, the loss function value is resulted from an error of the parameter value. In the forward propagation process, each layer causes a certain error. These errors accumulate layer by layer and are represented in the form of a loss function at the output layer. Without a given model function, we cannot determine the relationship between the loss function and the parameters, but can determine the relationship 𝜕𝐽/𝜕𝑜 between the loss function and the model output. This is a key step in understanding the BP algorithm. Assuming that an output of the last but one layer is 𝑜′, and an activation function of the output layer is f, the loss function may be expressed as follows: 1 𝐽(𝑤) = ∑ (𝑡𝑑 − 𝑓( 𝑤𝐿 𝑜′𝑑 ))2 2𝑚 𝑥∈𝑋,𝑑∈𝐷

𝑜′𝑑 is related to 𝑤1 , 𝑤2 , … , 𝑤𝐿−1 only. As illustrated, the loss function is split into two parts: a part caused by 𝑤𝐿 and a part caused by other parameters. The latter is accumulated by errors and acts on the loss function in the form of output at the last but one layer. According to 𝜕𝐽/𝜕𝑜 obtained above, 𝜕𝐽/𝜕𝑜′ and 𝜕𝐽/𝜕𝑤𝐿 can be easily

04 Deep Learning (Textbook)

138

Deep Learning Development Frameworks

Page 15

calculated. In this way, the gradient of the loss function with respect to the parameters of the output layer is calculated. It is easy to find that the derivative value 𝑓′(𝑤𝐿 𝑜′𝑑 ) of the activation function participates in the calculation of 𝜕𝐽/𝜕𝑜′ and 𝜕𝐽/𝜕𝑤𝐿 in the form of weight. When the derivative value of the activation function is always less than 1 (this is the case with the sigmoid function), the value of 𝜕𝐽/𝜕𝑜 becomes increasingly small during backpropagation. This phenomenon is called gradient vanishing, which will be described in more detail below. Other layer parameters may be similarly obtained based on the relationship between 𝜕𝐽/𝜕𝑜′ and 𝜕𝐽/𝜕𝑜′′. Intuitively, the BP algorithm is the process of distributing errors layer by layer. It is essentially an algorithm that uses the chain rule to calculate the loss function with respect to the parameters of each layer. Generally, the BP algorithm is shown in Figure 1-13.

Figure 1-13 BP algorithm In the formula, ⊙ indicates multiplication by element, and f is the activation function. Notably, the output of the ith layer is also the input of the (i+1)th layer. The output of the 0th layer is defined as the input of the entire network. In addition, when the activation function is sigmoid, the following can be proved: 𝑓 ′ (𝑥) = 𝑓(𝑥)(1 − 𝑓(𝑥)) Therefore, 𝑓 ′ (𝑜[𝑙 − 1]) in the algorithm can also be expressed as 𝑜[𝑙](1 − 𝑜[𝑙]).

1.3 Activation Function An activation function plays an important role in learning and understanding highly complex nonlinear functions by a neural network model. The existence of the activation function introduces nonlinear features into the neural network. If no activation function is used, the neural network can represent only one linear function regardless of the number of layers in the neural network. The complexity of the linear function is limited,

04 Deep Learning (Textbook)

139

Deep Learning Development Frameworks

Page 16

and the capability of learning complex function mappings from data is low. This section describes common activation functions of deep learning and their advantages and disadvantages. You can use them as required.

Figure 1-14 Activation functions As shown in the upper left part of Figure 1-14, the sigmoid function is the most commonly used activation function in the early stage of FNN research. Similar to the functions in the logistic regression model, the sigmoid function can be used at the output layer to implement binary classification. The sigmoid function is monotonic, continuous, and easy to derive. The output is bounded, and the network is easy to converge. However, the derivative of the sigmoid function approaches 0 at a location away from the origin. When the network is very deep, the BP algorithm makes more and more neurons fall into the saturation region, which makes the gradient modulus increasingly small. Generally, if the sigmoid network has five or fewer layers, the gradient is degraded to 0, which is difficult to train. This phenomenon is called gradient vanishing. Another defect of sigmoid is that the output of the sigmoid is not zero-centered. As shown in the upper-middle part of Figure 1-14, tanh is a major substitute for the sigmoid function. The tanh activation function corrects the defect of sigmoid that the output of the sigmoid is not zero-centered. The tanh activation function is closer to the natural gradient in the gradient descent algorithm, thereby reducing the required number of iterations. However, similar to sigmoid, the tanh function is easy to become saturated. As shown in the upper right part of Figure 1-14, the Softsign function reduces the tendency to saturation of the tanh and sigmoid functions to some extent. However, the Softsign, tanh, and sigmoid activation functions all easily cause gradient vanishing. The

04 Deep Learning (Textbook)

140

Deep Learning Development Frameworks

Page 17

derivative of an activation function always approaches 0 at a location far away from the function center. As a result, the weight cannot be updated. As shown in the lower left part of Figure 1-14, the Rectified Linear Unit (ReLU) function is the most widely used activation function at present. Compared with sigmoid and other activation functions, the ReLU function does not have an upper bound. Therefore, the neurons are never saturated. This effectively alleviates the gradient vanishing problem, and enables quick convergence in the gradient descent algorithm. Experiments show that neural networks using the ReLU activation function can perform well without unsupervised pre-training. In addition, an exponential operation needs to be performed on each of the functions such as sigmoid. Consequently, a calculation amount of these functions is quite large. The ReLU activation function can reduce a lot of calculation workload. Although the ReLU function has many advantages, its disadvantages are obvious. Because the ReLU function does not have an upper bound, the ReLU function is easy to diverge during training. Moreover, the ReLU function is not differentiable at a location with value 0. As a result, the ReLU function is not smooth enough in some regression problems. Most importantly, the value of the ReLU function is constantly 0 in the negative domain, which may result in neuron death. As shown in the lower middle part of Figure 1-14, the Softplus function is modified based on the ReLU function. Although the Softplus function has a larger computation amount than the ReLU function, the Softplus function has a continuous derivative, and a relatively smooth defined surface. The softmax function is an extension of the sigmoid function in high dimensions. The softmax function is used to map any K-dimensional real number vector to a Kdimensional probability distribution. Therefore, the softmax function is often used as the output layer of a multiclass classification task.

1.4 Regularization Regularization is a very important and effective technique in machine learning to reduce the generalization error. Compared with conventional machine learning models, a deep learning model generally has a larger capacity, and therefore is more likely to cause overfitting. To this end, researchers have proposed many effective techniques to prevent overfitting, including: 

Adding constraints to parameters, such as L1 and L2 norms.



Expanding the training dataset, such as adding noise and changing data.



Dropout



Stopping the training stopped in advance.

This section describes these methods one by one.

1.4.1 Parameter Penalty Many regularization methods restrict the learning capability of a model by adding a parameter penalty term Z(w) to the objective function J. 𝐽̃ = 𝐽 + 𝑎𝑍(𝑤)

04 Deep Learning (Textbook)

141

Deep Learning Development Frameworks

Page 18

In the formula, a is a non-negative penalty coefficient. The value of a measures the relative contribution of the penalty term Z and the standard objective function J to the total objective function. If a is set to 0, regularization is not used. A larger value of a indicates greater regularization strength. a is a hyperparameter. It should be noted that, in deep learning, a constraint is generally added only to the affine parameter w but not the bias term. This is because the bias term typically requires only a small amount of data for precise fitting, and adding constraints often leads to underfitting. Different regularization methods may be obtained based on different values of Z. This section describes two types of regularization: L1 regularization and L2 regularization. In linear regression models, Lasso regression can be obtained by L1 regularization, and ridge regression can be obtained by L2 regularization. Actually, L1 and L2 represent norms. The L1 norm of a vector is defined as: ||𝑤||1 = ∑ | 𝑤𝑖 | 𝑖

This formula represents the sum of absolute values of all elements in the vector. It can be proved that the gradient of the L1 norm is Sgn(w). In this way, the gradient descent method can be used to solve the L1 regularization model. The L2 norm is a common Euclidean distance. ||𝑤||2 = √∑ 𝑤𝑖2 𝑖

The L2 norm is widely used, and is often denoted as ||w|| with the subscript ignored. However, the gradient of the L2 norm is complex, and is generally represented by the following formula in L2 regularization: 1 𝑍(𝑤) = ||𝑤||2 2 As illustrated, a derivative of the penalty term for L2 regularization is w. Therefore, when gradient descent is performed on the L2 regularization model, the weight update formula should be changed to the following: 𝑤 = (1 − 𝜂𝑎)𝑤 − 𝜂𝛻𝐽 Compared with the normal gradient update formula, the preceding formula is equivalent to multiplying the parameter by a reduction factor, thereby limiting the parameter growth.

04 Deep Learning (Textbook)

142

Deep Learning Development Frameworks

Page 19

Figure 1-15 Geometric meaning of parameter penalty Figure 1-15 shows the difference between L1 regularization and L2 regularization. In the figure, the contour line indicates the standard objective function J, and the black solid line indicates the regular term. The geometric meaning of the parameter penalty is that, for any point in the feature space, not only the value of the standard objective function corresponding to the point but also the size of the geometric graph corresponding to the regular term of the point need to be considered. It is easy to image that, when the penalty coefficient a becomes larger, the black shape shows a stronger tendency to get smaller, and the parameter gets closer to the origin. As shown in Figure 1-15, it is highly probable that the parameter that stabilizes the L1 regularization model appears at a corner point of the square. This means that the parameters of the L1 regularization model are likely to be sparse matrices. According to

w

the example in the figure, the value of 1 corresponding to the optimal parameter is set to 0. Therefore, L1 regularization can be used for feature selection. From the perspective of probability distribution, many norm constraints are equivalent to adding prior distributions to parameters. The L2 norm indicates that the parameters conform to the Gaussian prior distribution, and the L1 norm indicates that the parameters conform to the Laplacian distribution.

1.4.2 Dataset Expansion The most effective way to prevent overfitting is to add a training set. A larger training set has a smaller overfitting probability. However, collecting data (especially labeled data) is time-consuming and expensive. Dataset expansion is a time-saving method, but it varies in different fields. In the field of object recognition, common dataset expansion methods include image rotation and scaling. The premise for image change is that the image class remains the same after the change. In handwritten digit recognition, digits 6 and 9 are easily confusing after rotation and require extra attention. In speech recognition, random noise is often added to the input data. The common idea of natural language recognition is to replace synonyms. Noise injection is a common method for dataset expansion. The noise injection object can be the input, a hidden layer, or the output layer. For the softmax classification problem, noise can be added to the output layer by using the label smoothing technology. Assuming that there are a total of K alternative classes for classification problems, the standard output provided by the dataset is generally represented as a K-dimensional vector through one-hot encoding. The elements corresponding to the correct class are 1, and other elements are 0. With noise added, the elements corresponding to the correct class may be 1–(k–1)e/k, and the other elements are e/k, where e represents a constant that is small enough. Intuitively, label smoothing narrows the difference between the label values of correct samples and wrong samples. This is equivalent to increasing the difficulty of model training. For a model with overfitting, increasing the difficulty can effectively alleviate the overfitting situation and further improve the model performance.

04 Deep Learning (Textbook)

143

Deep Learning Development Frameworks

Page 20

1.4.3 Dropout Dropout is a common regularization method with simple calculation. It has been widely used since 2014. To put it simply, Dropout randomly discards the output of some neurons during training. The parameters of these discarded neurons are not updated. Dropout constructs a series of subnets with different structures by randomly discarding input, as shown in Figure 1-16. These subnets are merged in a certain manner in the same deep neural network. This is equivalent to adopting the ensemble learning method. In the process of using the model, we want to use the collective wisdom of all the trained subnets, so random discarding is no longer used.

Figure 1-16 Dropout Compared with parameter penalty, Dropout has lower calculation complexity and is easier to implement. In the training process, the Dropout random process is neither a sufficient condition nor a necessary condition. Invariable shielding parameters can be constructed to obtain a model good enough. Generally, Dropout performs better when the activation function is close to the linear function.

1.4.4 Early Stopping of Training The training process can be stopped in advance, and the validation data can be periodically tested. As shown in Figure 1-17, when the loss function of the validation data starts to rise, the training can be stopped in advance to avoid overfitting. However, stopping the training in advance also brings the risk of underfitting. This is because the number of samples in the validation set is often insufficient. As a result, the training is often not stopped at the moment when the model generalization error is the smallest. In extreme cases, the generalization error of the model on the validation set may start to decrease quickly after a small rise, and stopping the training in advance may result in underfitting of the model.

04 Deep Learning (Textbook)

144

Deep Learning Development Frameworks

Page 21

Figure 1-17 Early stopping of training

1.5 Optimizers There are various optimized versions of gradient descent algorithms. In object-oriented language implementation, different gradient descent algorithms are often encapsulated into an object which is called an optimizer. Common optimizers include the SGD optimizer, momentum optimizer, Nesterov, Adagrad, Adadelta, RMSprop, Adam, AdaMax, and Nadam. These optimizers mainly improve the convergence speed of the algorithm and the stability of the algorithm after the convergence to the local extremum, and reduce the difficulty in adjusting the hyperparameters. This section describes the design of several most commonly used optimizers.

1.5.1 Momentum Optimizer The momentum optimizer is a basic improvement to the gradient descent algorithm. A momentum term is added to the weight update formula, as shown in Figure 1-18. If the weight variation during the nth iteration is d(n), the weight update rule is changed to the following: 𝑑(𝑛) = −𝜂𝛻𝑤 𝐽 + 𝑎𝑑(𝑛 − 1) In the formula, a is a constant between 0 and 1, called momentum. ad(n-1) is referred to as a momentum term. Imagine a small ball rolls down from a random point on the error surface. The common gradient descent algorithm is equivalent to moving the ball along the curve, but this does not conform to the physical law. In actual application, the ball accumulates momentum as it rolls down and thus has a greater velocity in the downhill direction.

04 Deep Learning (Textbook)

145

Deep Learning Development Frameworks

Page 22

Figure 1-18 Function of the momentum term In a region where the gradient direction is stable, the ball rolls more and more quickly. This helps the ball quickly cross the flat region and accelerate the model convergence. Moreover, as shown in Figure 1-19, the momentum term corrects the direction of the gradient and reduces sudden changes. In addition, the ball with inertia is more likely to roll over some narrow local extremum, making it less likely for the model to fall into the local extremum.

Figure 1-19 Accelerating model convergence by the momentum term The momentum optimizer is disadvantageous in that the momentum term may cause the ball to cross the optimal solution and additional iterations are required for convergence. Besides, the learning rate and momentum a of the momentum optimizer still need to be manually set, and more experiments are usually needed to determine a proper value.

1.5.2 AdaGrad Optimizer A characteristic common to the SGD algorithm, the MBGD algorithm, and the momentum optimizer is that each parameter is updated at the same learning rate. Adagrad considers that different learning rates should be set for different parameters. The gradient update formula of Adagrad is generally written as follows: 𝜂 𝛥𝑤 = − 𝑔(𝑛) 𝑛 𝑒 + √∑𝑖=1 𝑔2 (𝑖) In the formula, g(n) represents the gradient dJ/dw of the cost function in the nth iteration, and e is a small constant. As the value of n increases, the denominator in the formula gradually increases. Therefore, the weight update amplitude gradually decreases, which is equivalent to dynamically reducing the learning rate. In the initial phase of

04 Deep Learning (Textbook)

146

Deep Learning Development Frameworks

Page 23

model training, the distance between the initial value and the optimal solution of the loss function is long. Therefore, a high learning rate is required. However, as the number of updates increases, the weight parameter gets closer to the optimal solution, so the learning rate decreases accordingly. The advantage of Adagrad lies in its automatic update of the learning rate, but its disadvantage also comes from this. Because the update of the learning rate depends on the gradient in previous iterations, it is likely that the learning rate has been reduced to 0 when the weight parameter is far from the optimal solution. In this case, the optimization is meaningless.

1.5.3 RMSProp Optimizer The RMSprop optimizer is an improvement to the Adagrad optimizer. An attenuation coefficient is introduced to the algorithm of the RMSprop optimizer, so that the historical gradient is attenuated by a certain proportion in each iteration. The gradient update formula is as follows: 𝑟(𝑛) = 𝑏𝑟(𝑛 − 1) + (1 − 𝑏)𝑔2 (𝑛) 𝜂 𝛥𝑤 = − 𝑔(𝑛) 𝑒 + √𝑟(𝑛) In the formula, b is an attenuation factor, and e is a small constant. Due to the effect of the attenuation factor, r does not necessarily increase monotonically with the increase of n. Such practice solves the problem that the Adagrad optimizer stops too early, which is suitable for handling non-stationary targets, especially for RNN networks.

1.5.4 Adam Optimizer The adaptive moment estimation (Adam) is developed based on the Adagrad and Adadelta optimizers and is the most widely used optimizer at present. Adam tries to calculate an adaptive learning rate for each parameter, which is very useful in a complex network structure. Different parts of a network are sensitive to weight adjustment differently, and a very sensitive part generally requires a smaller learning rate. If the sensitive part is manually identified, it is difficult or complex to specify a dedicated learning rate for this part. When the parameters are updated, the gradient update formula of the Adam optimizer is similar to that of the RMSprop optimizer. 𝜂 𝛥𝑤 = − 𝑚(𝑛) 𝑒 + √𝑣(𝑛) In the formula, m and v represent the first-moment (mean) estimation and secondmoment (non-central variance) estimation of the historical gradient, respectively. Similar to the attenuation formula proposed in RMSprop, m and v can be defined as follows: 𝑚(𝑛) = 𝑎𝑚(𝑛 − 1) + (1 − 𝑎)𝑔(𝑛) 𝑣(𝑛) = 𝑏𝑣(𝑛 − 1) + (1 − 𝑏)𝑔2 (𝑛) With respect to their forms, m and v are the moving means of the gradient and gradient square, respectively. However, such definitions will cause the algorithm to be unstable during the first several iterations. Assuming that both m(0) and v(0) are 0, when a and b are close to 1, m and v are very close to 0 in the initial iteration. To solve this problem, the following are used in practice: 𝑚(𝑛) 𝑚 ̂ (𝑛) = 1 − 𝑎𝑛

04 Deep Learning (Textbook)

147

Deep Learning Development Frameworks

Page 24

𝑣(𝑛) 1 − 𝑏𝑛 The learning rate, a, and b all need to be manually set in Adam, and the setting difficulty is greatly reduced. Experiments show that, a is equal to 0.9, b is equal to 0.999, and the learning rate is 0.0001. In practice, Adam converges quickly. When the algorithm converges to saturation, the learning rate can be properly reduced, and other parameters do not need to be adjusted. Generally, the learning rate can be converged to a satisfactory extremum after being reduced for several times. 𝑣̂(𝑛) =

1.6 Types of Neural Networks From the beginning of BP neural networks, person put forward the neural network for solving various problems. In the field of computer vision, CNNs are currently the most widely used deep learning models. In the field of natural language processing, RNNs were once magnificent. This section introduces a game theory-based generative model: generative adversarial network (GAN).

1.6.1 CNN 1.6.1.1 Overview A CNN is an FNN. Different from a fully connected neural network, the CNN enables its artificial neurons to respond to units within a partial coverage area, and has excellent performance in image processing. The CNN generally includes a convolutional layer, a pooling layer, and a fully connected layer. In the 1960s, when studying neurons used for local sensitivity and direction selection in the cat visual cortex, Hubel and Wiesel found that the unique network structures could effectively reduce the complexity of FNNs, based on which they proposed the CNN. The CNN has become one of the research hotspots in many scientific fields, especially in pattern recognition. The CNN has been widely used because it avoids the complex image preprocessing and can directly input the original image. The name of CNN comes from convolution operations. Convolution is an inner product operation performed on an image (or a feature map) and a filter matrix (also called a filter or a convolution kernel). The image is an input of the neural network, and the feature map is an output of each convolutional layer or pooling layer in the neural network. The difference is that the values in the feature map are outputs of neurons. Therefore, the values are not limited theoretically. The values in the image correspond to the luminance of the RGB channels, and the values range from 0 to 255. Each convolutional layer in the neural network corresponds to one or more filter matrices. Different from a fully connected neural network, the CNN enables each neuron at a convolutional layer to use only the output of neurons in a local window but not all neurons at the upper layer as its input. This characteristic of convolution operations is referred to as local perception. It is generally considered that human perception of the outside world is from local to global. Spatial correlations among local pixels of an image are closer than those among pixels that are far away. Therefore, each neuron does not need to collect global information of an image and needs to collect only local information. Then we can obtain

04 Deep Learning (Textbook)

148

Deep Learning Development Frameworks

Page 25

the global information at a higher layer by synthesizing the local information collected by each neuron. The idea of sparse connectivity is inspired by the structure of the biological visual system. The neurons in the visual cortex can respond to the stimuli in only certain regions, and therefore can receive information locally. Another characteristic of convolution operations is parameter sharing. One or more convolution kernels can be used to scan an input image. A parameter in the convolution kernel is a weight of the model. At a convolutional layer, all neurons share the same convolution kernel, and therefore share the same weight. Weight sharing means that when each convolution kernel traverses the entire image, a parameter of the convolution kernel is fixed. For example, a convolutional layer has three feature convolution kernels, and each convolution kernel scans the entire image. In a scanning process, a parameter value of the convolution kernel is fixed, that is, all pixels of the entire image share the same weight. This means that the features learned from a part of the image can also be applied to other parts of the image or other images, which is called position invariance.

1.6.1.2 Convolutional Layer Figure 1-20 shows the typical architecture of a CNN. The leftmost image in the figure is the model input. The input image first passes through a convolutional layer including three convolution kernels to obtain three feature maps. Parameters of the three convolution kernels are independent of each other, and may be obtained by optimizing the BP algorithm. During a convolution operation, a window of the input image is mapped to a neuron in the feature map. The purpose of the convolution operation is to extract different input features. The first convolutional layer may extract only some lowlevel features such as edges, lines, and angles. A multi-layer network can extract more complex features based on the low-level features.

Figure 1-20 CNN structure The convolution operation (Han Bingtao, 2017) shown in Figure 1-21 is considered. In a five-dimensional matrix, a maximum of 3 x 3 different regions with the same shapes as the convolution kernel can be found. Therefore, the dimension of the feature map is 3 x 3.

04 Deep Learning (Textbook)

149

Deep Learning Development Frameworks

Page 26

Figure 1-21 Convolution operation example As shown in Figure 1-22, each element in the feature map is obtained by multiplying a region of the original image by a convolution kernel. In the matrix shown in the left part of Figure 1-22, the yellow region is related to the elements in the upper left corner of the feature map. Each element in this part is multiplied by a corresponding element in the convolution kernel, and a sum of the products is obtained, to obtain the first element 4 in the feature map. The example here does not contain the bias term, that is, the bias is equal to 0. In a more general convolution operation, a result usually needs to be summed up with the bias term after a point multiplication operation, so the result can be output as a feature map. The bias term in this example has a similar meaning to the bias term in linear regression.

Figure 1-22 Convolution operation example The basic structure of a convolutional layer is multi-channel convolution. As shown in Figure 1-23, one convolutional layer can contain multiple convolution kernels and bias terms. Each combination of a convolution kernel and a bias term can map an input tensor to a feature map. The meaning of the multi-channel convolution is to stitch all feature maps obtained from the convolution kernels and bias terms to form a threedimensional matrix as output. Generally, the input and output tensors and the convolution kernels are all three-dimensional matrices, and the three dimensions represent the width, height, and depth. To extend the foregoing convolution operation to three dimensions, set the depth and input tensor of each convolution kernel to the same. This ensures that the depth of the feature map corresponding to a single convolution kernel is 1. The convolution operation does not pose specific requirements on the width and height of the convolution kernel. However, for ease of operation, the width and height of the convolution kernel are generally the same. In addition, the feature maps

04 Deep Learning (Textbook)

150

Deep Learning Development Frameworks

Page 27

obtained through calculation by using different convolution kernels must have the same width and height so that they can be stitched together. In other words, all convolution kernels at the same convolutional layer must have the same size.

Figure 1-23 Convolutional layer structure The feature map output by the convolutional layer needs to be activated. Activation functions are sometimes considered as a part of the convolutional layer. However, because an activation function is not closely related to a convolution operation, the activation function is sometimes implemented as an independent layer. The most commonly used activation layer is the ReLU layer, that is, the ReLU activation function.

1.6.1.3 Pooling Layer A pooling layer combines nearby units, reduces a size of a feature map, and reduces dimensions. Common pooling layers include the max pooling layer and the average pooling layer. As shown in Figure 1-24, the max pooling layer divides a feature map into several regions and uses the maximum value of each region to represent the entire region. The average pooling is similar to the max pooling, except that the average value of each region is used to represent the region. A shape of each region in the feature map is referred to as a pooling window size.

04 Deep Learning (Textbook)

151

Deep Learning Development Frameworks

Page 28

Figure 1-24 Pooling operation example In an actual CNN, basically, convolutional layers and pooling layers are alternately interconnected. Both pooling and convolution can increase the feature scale, which is equivalent to extracting the features of the previous layer. However, different from the convolution operation, the pooling layer does not include any parameter. In addition, the pooling layer does not involve arrangement of elements in each small region, and concerns only statistical features of these elements. The pooling layer focuses on reducing the size of input data of the next layer, effectively reducing a quantity of parameters, reducing a calculation amount, and preventing overfitting. Another function of the pooling layer is to map an input of any size to an output of a fixed length by properly setting the size and step of a pooling window. It is assumed that an input size is 𝑎 × 𝑎, a size of the pooling window is ⌈𝑎/4⌉, and a step is ⌊𝑎/4⌋. If a is a multiple of 4, the size of the pooling window is equal to the step, and it is easy to learn that the output size of the pooling layer is 4 × 4. When a is an integer that is not exactly divided by 4, the size of the pooling window is always greater than the step by 1, and it can be proved that the output size of the pooling layer is still 4 × 4. This feature of the pooling layer enables the CNN to be applicable to an input image of any size.

1.6.1.4 Fully Connected Layer A fully connected layer is generally used as an output of the CNN. A common task in the pattern recognition field is classification or regression, for example, determining a class of an object in an image, or scoring an object in an image. For these problems, it is obviously inappropriate to use a feature map as an output, and therefore a feature map needs to be mapped to a vector that meets a requirement. This operation usually involves vectorization of the feature map, that is, arranging each neuron in the feature map into a vector in a fixed sequence.

1.6.2 RNN The RNN is a neural network that captures dynamic information in sequential data through periodical connections of hidden layer nodes. It can classify sequential data.

04 Deep Learning (Textbook)

152

Deep Learning Development Frameworks

Page 29

Unlike other FNNs, an RNN can hold the context state in the sequential data. The RNN is no longer limited to spatial boundaries of conventional neural networks, and can be extended in time sequences. Intuitively, the nodes between the memory unit at the current moment and the memory unit at the next moment can be connected. RNNs are widely used in sequence-related scenarios, such as videos, audios, and sentences.

Figure 1-25 RNN structure The left part of Figure 1-25 shows the classic structure of RNNs. In the figure, x(t) indicates the value of an input sequence at time node t, s(t) indicates the state of a memory unit at time node t, o(t) indicates the output of a hidden layer at time node t, and U, V, and W respectively indicate model weights. It can be seen that the update of the hidden layer depends not only on the current input x(t), but also on the memory unit state s(t–1) of the previous time node, that is, s(t) = f(Ux(t) + Ws(t–1)), where f represents an activation function. The output layer of the RNN is the same as that of the MLP, and details are omitted herein.

Figure 1-26 RNN structure As shown in Figure 1-26 (Andrej Karpathy, 2015, The Unreasonable Effectiveness of RNNs), there are many different RNNs structures. The leftmost part of Figure 1-26 indicates a common BP neural network, which does not involve a time sequence. The second part from the leftmost of Figure 1-26 is a generative model that can generate sequences that meet specific requirements based on a single input. The middle part of Figure 1-26 is the most typical RNN structure that can be used for classification or regression tasks. The two right parts of Figure 1-26 both can be used for sequence translation. The structure in the second part from the rightmost of Figure 1-26 is also referred to as encoder-decoder structure.

04 Deep Learning (Textbook)

153

Deep Learning Development Frameworks

Page 30

The RNN relies on the backpropagation through time (BPTT) algorithm, which is an extension of the conventional BP algorithm on time sequences. The conventional BP algorithm considers only the error propagation between different hidden layers, while the BPTT algorithm further needs to consider the error propagation within the same hidden layer between different time nodes. Specifically, the error of a memory unit at moment t consists of two parts: a component propagated by the hidden layer at moment t, and a component propagated by the memory unit at moment t+1. The method for calculating the two components when they are separately propagated is the same as that of the conventional BP algorithm. When propagated to the memory unit, the sum of the two components is used as the error of the memory unit at moment t. It is easy to calculate gradients of parameters U, V, and W at moment t based on the errors of the hidden layer and the memory unit at moment t. After all time nodes are traversed reversely, T gradients are obtained for each of the parameters U, V, and W, where T indicates a total time length. The sum of the T gradients is the total gradient of the parameters U, V, and W. After obtaining the gradient of each parameter, you can easily solve the problem by using the gradient descent algorithm. RNNs still have many problems. Because the memory unit receives output from its own previous moment each time, problems easily occurred in deep fully connected neural networks such as gradient vanishing and gradient explosion also trouble RNNs. Moreover, the state of the memory unit at moment t cannot exist for a long time. The state of the memory unit needs to be mapped by the activation function at each moment. When a loop reaches the end of a long sequence, the input at the beginning of the sequence may already be scattered in the mapping of the activation function. In other words, the RNN attenuates the information that is stored for a long time.

Figure 1-27 LSTM neural network We want the model to hold memory information for a long period of time in many tasks. However, when the capacity of the memory unit is limited, the RNN inevitably fails to memorize all information in the whole sequence. Therefore, we hope that the memory

04 Deep Learning (Textbook)

154

Deep Learning Development Frameworks

Page 31

unit can selectively remember key information, and the long short-term memory (LSTM) network can implement this function. As shown in Figure 1-27, (Colah, 2015, Understanding LSTMs Networks), the core of the LSTM network is the LSTM block, which replaces the hidden layer in RNNs. The LSTM block includes three computing units: an input gate, a forget gate, and an output gate, so that the LSTM can selectively memorize, forget, and output information. In this way, the selective memory function is implemented. Notably, there are two lines connecting adjacent LSTM blocks, representing the cell state and the hidden state of the LSTM, respectively.

Figure 1-28 Gate recurrent unit As shown in Figure 1-28, the gate recurrent unit (GRU) is a variant of the LSTM. The GRU combines the forget gate and the input gate into an update gate. The GRU also combines the cell state and hidden state of the LSTM into a single hidden state. The GRU model is simpler than the standard LSTM model and is very popular.

1.6.3 GAN A GAN is a framework that can be used in scenarios such as image generation, semantic segmentation, text generation, data augmentation, chatbots, information retrieval, and information sorting. Before the emergence of GANs, a deep generation model usually needs a Markov chain or maximum conditional likelihood estimation, which can easily lead to a lot of difficult probabilistic problems. Through the adversarial process, a GAN trains generator G and discriminator D at the same time for the two parties to play the game. Discriminator D is used to determine whether a sample is real or generated by generator G. Generator G is used to try to generate a sample that cannot be distinguished from real samples by discriminator D. The GAN adopts a mature BP algorithm for training.

04 Deep Learning (Textbook)

155

Deep Learning Development Frameworks

Page 32

Figure 1-29 GAN structure As shown in Figure 1-29, the input of the generator is noise z. z conforms to a manually selected prior probability distribution, such as a uniform distribution or a Gaussian distribution. The input space can be mapped to the sample space by using a certain network structure. The input of the discriminator is a real sample x or a forged sample G(z), and the output is the authenticity of the sample. Any classification model can be used to design the discriminator. CNNs and fully connected neural networks are commonly used as discriminators. For example, we might want to generate an image depicting a cat and make the image as real as possible. The discriminator is used to determine whether the image is real. The objective of the GAN is the generator:

G  minG max D Ex ~ Pdata [log D( x)]  Ez ~ Pz [log(1  D(G( z)))] The objective function consists of two parts. The first part is related only to discriminator D. If a real sample is input, the value of the first part is larger when the output of D is closer to 1. The second part is related to both G and D. When the input is random noise, G can generate a sample. Discriminator D receives this sample as input. The value of the second part is larger when the output is closer to 0. Since the objective of D is to maximize the objective function, it is necessary to output 1 in the first term and 0 in the second term, that is, to correctly classify the samples. Although the objective of the generator is to minimize the objective function, the first term of the objective function is irrelevant to the generator. Therefore, the generator can only minimize the second term. To minimize the second term, the generator needs to output a sample that makes the discriminator output 1, that is, make the discriminator as unable to identify sample authenticity as possible. Since GAN was first proposed in 2014, more than 200 GAN variants have been derived and widely used in many generation problems. However, the original GAN also has some problems, for example, an unstable training process. The training processes of the fully connected neural network, CNN, and RNN described above all minimize the cost function by optimizing parameters. GAN training is different, mainly because the adversarial

04 Deep Learning (Textbook)

156

Deep Learning Development Frameworks

Page 33

relationship between generator G and discriminator D is uneasy to be balanced. A general GAN training process is: alternately training D and G until D(G(z)) is basically stable at about 0.5. In this case, D and G reach Nash equilibrium, and the training ends. However, in some cases, the model is hard to reach Nash equilibrium, and may even encounter problems such as pattern crash. Therefore, how to improve the GAN to increase model stability has always been a hot topic in academic research. In general, GANs have some disadvantages, but these disadvantages do not affect the importance of the GANs to generation models.

1.7 Common Issues Deep learning models are complex and may encounter various problems during training. This section summarizes common issues so that you can quickly locate and solve the issues.

1.7.1 Data Imbalance In datasets of classification tasks, the number of samples in each class may be unbalanced. Data imbalance occurs when the number of samples in one or more classes for prediction is very small. For example, among 4251 training images, more than 2000 classes may contain only one image, and some categories may contain 2 to 5 images. In this case, the model cannot adequately check each category, affecting model performance. The methods for alleviating data imbalance mainly include random undersampling, random oversampling and Synthetic Minority Over-sampling Technique (SMOTE). Random undersampling is to randomly remove samples from a category with sufficient observations. This method can increase the running time and solve the storage problem when the training dataset is very large. However, during sample deletion, some samples containing important information may also be discarded, and the remaining samples may have deviations and cannot accurately represent major classes. Therefore, random undersampling may lead to inaccurate results on actual test datasets. Random oversampling is to increase the number of observations by copying existing samples for unbalanced classes. Unlike undersampling, this method does not cause information loss, so the performance on the actual test datasets is generally better than that of undersampling. However, because the new samples are the same as the original samples, the possibility of overfitting is increased. SMOTE requires using a synthesis method to obtain observations of unbalanced classes. It is similar to existing methods that use the nearest neighbor classification. SMOTE first selects a data subset from minor classes, and then synthesizes new samples based on the subset. These synthesized samples are added to the original dataset. This method is advantageous in that it does not lose valuable information, and can also effectively alleviate overfitting by generating synthetic samples through random sampling. However, for high-dimensional data, SMOTE performance is less satisfactory. When generating a synthetic instance, SMOTE does not take into account adjacent instances from other classes. This results in an increase in class overlap and causes additional noise.

04 Deep Learning (Textbook)

157

Deep Learning Development Frameworks

Page 34

1.7.2 Gradient Vanishing and Gradient Explosion When the number of network layers is large enough, the gradients of model parameters in the backpropagation process may become very small or large, which is called gradient vanishing or gradient explosion. In essence, both problems originate from backpropagation formulas. Assuming that a model has three layers and each layer has only one neuron, a backpropagation formula can be written as follows: 𝛿1 = 𝛿3 𝑓′2 (𝑜1 )𝑤3 𝑓′1(𝑜0 )𝑤2

f is the activation function. In this example, the sigmoid function is used as an example. As the number of network layers increases, the number of occurrences of f(o)w in the formula increases. According to the mean inequality, the maximum value of 𝑓 ′ (𝑥) = 𝑓(𝑥)(1 − 𝑓(𝑥)) is 1/4. Therefore, when w is not greater than 4, f(o)w is definitely less than 1. When multiple terms less than 1 are multiplied, 𝛿1 inevitably approaches 0. This is the cause of the gradient vanishing. Similarly, gradient explosion mainly occurs in cases that w is very large. When multiple terms larger than 1 are multiplied, 𝛿1 is very large. Actually, gradient explosion and gradient vanishing are caused by the deep network and unstable network weight update. In essence, they are caused by the chain rule in gradient backpropagation. Methods for coping with gradient vanishing mainly include pretraining, ReLU activation functions, LSTM neural networks, and residual modules. (In 2015, ILSVRC champion ResNet increased the model depth to 152 layers by introducing a residual module into the model. In comparison, the 2014 champion GoogLeNet has only 27 layers.) The main solution to gradient explosion is gradient clipping. The idea of gradient clipping is to set a gradient threshold and forcibly limit the gradient within this range to prevent excessively large gradients.

1.7.3 Overfitting Overfitting refers to the problem that a model performs well on the training set but poorly on the test set. Overfitting may be caused by many reasons, such as excessively high feature dimensions, excessively complex model assumptions, excessive parameters, insufficient training data, and excessive noise. In essence, overfitting occurs because the model overfits the training dataset without taking into account the generalization capability. Consequently, the model can better predict the training set, but the prediction result of the new data is poor. If overfitting occurs due to insufficient training data, consider more data. One approach is to obtain more data from the data source, but this approach is often time-consuming and laborious. A more common practice is data augmentation. If overfitting is caused by an excessively complex model, multiple methods can be used to suppress overfitting. The simplest method is to adjust hyperparameters of the model and reduce the number of layers and neurons on the network to limit the fitting capability of the network. Alternatively, the regularization technology may be introduced into the model. Related content has been described above and therefore is omitted herein.

04 Deep Learning (Textbook)

158

Deep Learning Development Frameworks

Page 35

1.8 Summary This chapter mainly introduces the definition and development of neural networks, training rules of perceptron machines, and common neural networks (CNNs, RNNs, and GANs). It also describes common issues and solutions of neural networks in AI engineering.

1.9 Quiz 1.

Deep learning is a new research direction derived from machine learning. What are the differences between deep learning and conventional machine learning?

2.

In 1986, the introduction of MLP ended the first "cold winter" in the history of machine learning. Why can MLP solve the XOR problem? What is the role of activation functions in the problem solving?

3.

The sigmoid activation function is widely used in the early stage of neural network research. What problems does it have? Does the tanh activation function solve these problems?

4.

The regularization method is widely used in deep learning models. What is its purpose? How does Dropout implement regularization?

5.

An optimizer is the encapsulation of model training algorithms. Common optimizers include SGD and Adam. Try to compare the performance differences between optimizers.

6.

Supplement the convolution operation result in Figure 1-22 by referring to the example.

7.

RNNs can save the context state in the sequential data. How is this memory function implemented? What problems might occur when you deal with long sequences?

8.

The GAN is a deep generative network framework. Please briefly describe its training principle.

9.

Gradient explosion and gradient vanishing are common problems in deep learning. What are their causes? How can I avoid these problems?

04 Deep Learning (Textbook)

159

Deep Learning Development Frameworks

2

Page 36

Deep Learning Development Frameworks This chapter introduces the common frameworks and their features in the AI field, and describes the typical framework TensorFlow in detail to help you understand the concept of AI and put it into practice to meet actual demands. This chapter also introduces MindSpore, a Huawei-developed framework that boasts many unsurpassable advantages. After reading this chapter, you can choose to use MindSpore based on your requirements.

2.1 Deep Learning Development Frameworks 2.1.1 Introduction to PyTorch PyTorch is a Python-based machine learning computing framework released by Facebook. It is developed based on Torch, a scientific computing framework supported by a large number of machine learning algorithms. Torch is a tensor operation library similar to NumPy, featuring high flexibility, but it is less popular because it uses the programming language Lua. This is why PyTorch is developed. In addition to Facebook, organizations such as Twitter, GMU, and Salesforce also use PyTorch. The following sections describe the features of PyTorch.

2.1.1.1 Python First PyTorch does not simply bind Python to the C++ framework. PyTorch directly supports Python access at a fine grain. Developers can use PyTorch as easily as using NumPy or SciPy. This not only lowers the threshold for understanding Python, but also ensures that the code is basically consistent with the native Python implementation.

2.1.1.2 Dynamic Neural Network Many mainstream frameworks such as TensorFlow 1.x do not support this feature. To run TensorFlow 1.x, developers must create static computational graphs in advance, and run the feed and run commands to repeatedly execute the created graphs. In contrast, PyTorch with this feature is free from such complexity, and PyTorch programs can dynamically build or adjust computational graphs during execution.

2.1.1.3 Easy to Debug PyTorch can generate dynamic graphs during execution, and developers can stop the interpreter in the debugger and view the output of a specific node.

04 Deep Learning (Textbook)

160

Deep Learning Development Frameworks

Page 37

In addition, PyTorch provides tensors that support CPUs and GPUs, greatly accelerating computing.

2.1.2 Introduction to MindSpore Based on the design ideas of algorithm as code, efficient execution and flexible deployment, Huawei has developed the core architecture of MindSpore. The architecture is divided into four layers. The on-demand collaborative distributed architecture, scheduling, distributed deployment, and communication library reside at the same layer. The next is the execution efficiency layer (including data model downstream deployment). The parallelism layer contains pipeline execution, deep graph optimization, and operator fusion. The upper layer is MindSpore intermediate representation (IR) for computational graphs. MindSpore enables automatic differentiation, automatic parallelism, and automatic tuning, and supports all-scenario application programming interfaces (APIs) that comply with our design ideas: algorithm as code, efficient execution, and flexible deployment. The core of the AI framework and one of the decisive factors of a programming paradigm is the automatic differentiation technology used in the AI framework. A deep learning model is trained by forward and backward computation. Taking the mathematical expression here as an example, the forward computation of this formula is performed by the computation process at the black arrow. After the output f of the forward computation is obtained, the backward computation is performed by using the chain rule to obtain x, differential value of y. During model design, only forward computation is covered, while backward computation needs to be implemented by an automatic differential technology of a framework. In addition, with the expansion of NLP models, the memory overhead for training ultralarge models such as Bert (340M) and GPT-2 (1542M) exceeds the capacity of a single card. Therefore, the models need to be divided into multiple cards for execution. Currently, the manual model parallelism is used in the industry. It requires model segmentation and cluster topology awareness, so it is difficult to develop. In addition, it is also difficult to ensure high performance and optimize performance. MindSpore can automatically segment the entire graph based on the input and output data of the data dimensions of the operator, and integrate data parallelism and model parallelism. Cluster topology awareness scheduling allows the cluster topology to be perceived, and automatic scheduling of subgraphs to be executed to minimize the communication overhead. It can maintain the single-node coding logic to implement model parallelism, improving the development efficiency tenfold compared with manual parallelization. Model execution is now facing huge challenges under powerful computing power: the memory wall problem, high interaction overhead, and difficult data supply. Partial operations are performed on the host, while the others are performed on the device. The interaction overhead is much larger than the execution overhead, resulting in the low accelerator usage. MindSpore uses the chip-oriented deep graph optimization technology to minimize the synchronization waiting time and maximize the parallelism of data, computing, and communication. Data and the entire graph computation are on the Ascend AI Processor.

04 Deep Learning (Textbook)

161

Deep Learning Development Frameworks

Page 38

MindSpore also uses the on-device execution to implement decentralization. The optimization of adaptive graph segmentation driven by gradient data can implement autonomous All Reduce and synchronize the gradient aggregation, boosting computing and communication efficiency. In addition, it uses the distributed architecture of on-demand device-edge-cloud collaboration. The unified model IR brings consistent deployment experience, and the graph optimization technology of software and hardware collaboration shields scenario differences. Device-cloud collaboration of Federal Meta Learning breaks the boundaries of device and cloud, and implements real-time update of the multi-device collaboration model.

2.1.3 Introduction to TensorFlow TensorFlow is Google's second-generation open-source software library for digital computing. The TensorFlow computing framework supports various deep learning algorithms and multiple computing platforms, ensuring high system stability. TensorFlow has the following features:

2.1.3.1 Multi-platform All platforms that support the Python development environment also support TensorFlow. However, TensorFlow depends on other software such as the NVIDIA CUDA Toolkit and cuDNN to access a supported GPU.

2.1.3.2 GPU TensorFlow supports certain NVIDIA GPUs, which are compatible with NVIDIA CUDA Toolkit versions that meet specific performance standards.

2.1.3.3 Distributed TensorFlow supports distributed computing, allowing computational graphs to be computed on different processes. These processes may be located on different servers.

2.1.3.4 Multi-lingual The main programming language of TensorFlow is Python. C++, Java, and Go API can also be used, but stability cannot not be guaranteed, as are many third-party bindings for C#, Haskell, Julia, Rust, Ruby, Scala, R (even PHP). Google recently released a mobileoptimized TensorFlow-Lite library for running TensorFlow applications on Android.

2.1.3.5 Scalability One of the main advantages of using TensorFlow is that it has a modular, scalable, and flexible design. Developers can easily port models among the CPU, GPU, and TPU with a few code changes. Python developers can develop their own models by using native and low-level APIs (or core APIs) of TensorFlow, or develop built-in models by using advanced API libraries of TensorFlow. TensorFlow has many built-in and distributed libraries. It can be overlaid with an advanced deep learning framework such as Keras to serve as an advanced API.

04 Deep Learning (Textbook)

162

Deep Learning Development Frameworks

Page 39

2.1.3.6 Powerful Computing Performance TensorFlow can achieve the best performance on Google TPU, but it also strives to achieve high performance on a variety of platforms, including servers, desktops, embedded systems, and mobile devices. The distributed deployment of TensorFlow enables itself to run on different computers. From smartphones to computer clusters, the desired training models can be generated. Currently, supported native distributed deep learning frameworks include TensorFlow, CNTK, DeepLearning4J, and MXNet. When a single GPU is used, most deep learning frameworks rely on cuDNN, and therefore support almost the same training speed, provided that the hardware computing capabilities or allocated memories slightly differ. However, for large-scale deep learning, massive data makes it difficult for the single GPU to complete training in a limited time. To handle such cases, TensorFlow enables distributed training. TensorFlow is considered as one of the best libraries for neural networks, and can reduce difficulty in deep learning development. In addition, TensorFlow is an open-source platform, which facilitates TensorFlow maintenance and update, improve the efficiency of TensorFlow. Keras, ranking third in the number of stars on GitHub, is packaged into an advanced API of TensorFlow 2.0, which makes TensorFlow 2.0 more flexible, and easier to debug. After a tensor is created in TensorFlow 1.0, the result cannot be returned directly. To obtain the result, the session mechanism needs to be created, which includes the concept of graph, and code cannot run without session.run. This style is more like the hardware programming language VHDL. Compared with some simple frameworks such as PyTorch, TensorFlow 1.0 adds the preceding concepts, which are confusing for users. It is complex to debug TensorFlow 1.0, and its APIs are disordered, making it difficult for beginners. Learners will come across many difficulties in using TensorFlow 1.0 even after gaining the basic knowledge. As a result, many researchers have turned to PyTorch.

2.2 TensorFlow 2.0 Basics 2.2.1 Introduction The core function of TensorFlow 2.0 is the dynamic graph mechanism called eager execution. It allows users to compile and debug models like writing normal programs, making TensorFlow easier to learn and apply. It also supports more platforms and languages, and improves the compatibility between components by standardizing the exchange formats and alignment of APIs. Deprecated APIs have been deleted in this version, and duplicate APIs have been reduced to avoid confusion. TensorFlow 2.0 also delivers excellent performance in compatibility and continuity by providing the TensorFlow 1.x compatibility module. In addition, the tf.contrib module has been removed. Maintained modules are moved to separate repositories. Unused and unmaintained modules are removed.

04 Deep Learning (Textbook)

163

Deep Learning Development Frameworks

Page 40

2.2.2 Tensors Tensor is the most basic data structure in TensorFlow. All data is encapsulated in tensors. It is defined as a multidimensional array. A scalar is a rank-0 tensor. A vector is a rank-1 tensor. A matrix is a rank-2 tensor. In TensorFlow, tensors are classified into constant tensors and variable tensors.

2.2.3 Eager Execution Mode Static graph: TensorFlow 1.0 uses static graphs (graph mode) to separate the definition and execution by using computational graphs. This is a declarative programming model. In graph mode, developers need to build a computational graph, start a session, and then input data to obtain an execution result. This static graph has many advantages in distributed training, performance optimization, and deployment. However, it is inconvenient to perform debugging, which is similar to invoking a compiled C language program. In this case, internal debugging cannot be performed on the program. Therefore, eager execution based on dynamic calculation graphs is provided. Eager execution is a type of imperative programming, which is consistent with the native Python. A result is returned immediately after an operation is performed. TensorFlow 2.0 uses the eager execution mode by default.

2.2.4 AutoGraph In TensorFlow 2.0, eager execution is enabled by default. Eager execution is intuitive and flexible for users (easier and faster to run a one-time operation), but may compromise performance and deployability. To achieve optimal performance and make a model deployable anywhere, you can run @tf.function to add a decorator to build a graph from a program, making Python code more efficient. tf.function can build a TensorFlow operation in the function into a graph. In this way, this function can be executed in graph mode. Such practice can be considered as encapsulating the function as a TensorFlow operation of a graph.

2.3 TensorFlow 2.0 Modules 2.3.1 Common Modules tf: Functions in the tf module are used to perform common arithmetic operations, such as tf.abs (calculating an absolute value), tf.add (adding elements one by one), and tf.concat (concatenating tensors). Most operations in this module can be performed by NumPy. 1.

tf.errors: error type module of TensorFlow

04 Deep Learning (Textbook)

164

Deep Learning Development Frameworks

Page 41

2.

tf.data: implements operations on datasets. Input pipes created by tf.data are used to read training data. In addition, data can be easily input from memories such as NumPy.

3.

tf.distributions: implements various statistical distributions. The functions in this module are used to implement various statistical distributions, such as Bernoulli distribution, uniform distribution, and Gaussian distribution.

4.

tf.gfile: implements operations on files. Functions in this module can be used to perform file I/O operations, copy files, and rename files.

5.

tf.image: implements operations on images. Functions in this module include image processing functions. This module is similar to OpenCV, and provides functions related to image luminance, saturation, phase inversion, cropping, resizing, image format conversion (RGB to HSV, YUV, YIQ, or gray), rotation, and Sobel edge detection. This module is equivalent to a small image processing package of OpenCV.

6.

tf.keras: a Python API for invoking Keras tools. This is a large module that enables various network operations.

7.

tf.nn: function support module of the neural network. It is the most commonly used module, which is used to construct the classical convolutional network. It also contains the sub-module of rnn_cell, which is used to construct the recurrent neural network. Common functions include: avg_pool (...), batch_normalization (...), bias_add (...), conv2d (...), dropout (...), relu (...), sigmoid_cross_entropy_with_logits(...), and softmax (...).

2.3.2 Keras API TensorFlow 2.0 recommends Keras for network building. Common neural networks are included in keras.layers. Keras is a high-level API used to build and train deep learning models. It can be used for rapid prototype design, advanced research, and production. It has the following three advantages:

2.3.2.1 Easy to Use Keras provides simple and consistent API that is optimized for common cases. It also provides practical and clear feedback on user errors.

2.3.2.2 Modular and Composable You can build Keras models by connecting configurable building blocks together, with little restriction.

2.3.2.3 Easy to Extend You can customize building blocks to express new research ideas, create layers and loss functions, and develop advanced models. The common functional modules are as follows:

04 Deep Learning (Textbook)

165

Deep Learning Development Frameworks

Page 42

2.3.2.4 tf.keras.layers The tf.keras.layers namespace provides a large number of common network layer APIs, such as fully connected layer, active aquifer, pooling layer, convolutional layer, and recurrent neural network layer. For these network layers, you only need to specify the related parameters of the network layer during creation and invoke the __call__ method to complete the forward computation. When invoking the __call__ method, Keras automatically invokes the forward propagation logic of each layer. Generally, the logic is implemented in the call function of the class.

2.3.2.5 Network Container For common networks, class instances at each layer need to be manually called to complete the forward propagation computation. When the number of network layers becomes large, the code is bloated. The network container Sequential provided by Keras can be used to encapsulate multiple network layers into a large network model. The instance of the network model needs to be invoked so that the sequential computing of data from the first layer to the last layer can be completed at one time.

2.4 Basic Development Steps of TensorFlow 2.0 2.4.1 Environment Setup 2.4.1.1 Environment Setup in Windows Operating system: Windows 10 Pip software built in Anaconda 3 (adapting to Python 3) Install TensorFlow. Open Anaconda Prompt and run the pip command to install TensorFlow.

Figure 2-1 Installation command Run the pip install tensorflow command on the command line API, as shown in Figure 21.

04 Deep Learning (Textbook)

166

Deep Learning Development Frameworks

Page 43

2.4.1.2 Environment Setup in Linux The simplest way for installing TensorFlow in Linux is to run the pip command. If the installation speed is slow, change to Tsinghua mirror in China and run the following command on the terminal: pip install pip –U pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple Run the pip install tensorflow==2.0.0 command to install TensorFlow.

2.4.2 Development Process The configuration process includes the following steps: 1.

Data preparation: includes data exploration and data processing.

2.

Network construction: includes defining the network structure, the loss function, the model evaluation indicators, and selecting the optimizer.

3.

Model training and verification

4.

Model saving

5.

Model restoration and invoking

The following describe the preceding process based on an actual project, MNIST handwritten digit recognition. Handwritten digit recognition is a common image recognition task where computers recognize text in handwriting images. Different from printed fonts, handwriting of different person has different sizes and styles, making it difficult for computers to recognize handwriting. This project applies deep learning and TensorFlow tools to train and build models based on MNIST handwriting datasets.

2.4.2.1 Data Preparation Download the MNIST datasets from http://yann.lecun.com/exdb/mnist/. The MNIST datasets consist of a training set and a test set. 

Training set: 60,000 handwriting images and corresponding labels



Test set: 10,000 handwriting images and corresponding labels

Figure 2-2 shows a dataset example.

04 Deep Learning (Textbook)

167

Deep Learning Development Frameworks

Page 44

Figure 2-2 Dataset example

2.4.2.2 Network Construction The softmax function is also called normalized exponential function. It is a derivative of the binary classification function sigmoid in terms of multi-class classification. Figure 2-3 shows the calculation method of softmax.

Figure 2-3 Softmax calculation method The process of model establishment is the core process of network structure definition. As shown in Figure 2-4, the network operation process defines how the output is calculated based on the input.

Figure 2-4 Model calculation process Figure 2-5 shows the core code for TensorFlow to implement the softmax regression model.

04 Deep Learning (Textbook)

168

Deep Learning Development Frameworks

Page 45

Figure 2-5 Softmax implementation code Model compilation involves the following two parts: Loss function selection: In machine learning or deep learning, an indicator needs to be defined to indicate whether a model is proper. This indicator is called cost or loss, and is minimized as far as possible. In this project, the cross entropy loss function is used. Gradient descent method: A loss function is constructed for an original model needs to be optimized by using an optimization algorithm, to find optimal parameters and further minimize a value of the loss function. Among optimization algorithms for solving machine learning parameters, the gradient descent-based optimization algorithm (gradient descent) is usually used.

2.4.2.3 Model Training and Verification As shown in Figure 2-6, all training data is trained through batch iteration or full iteration. In the experiment, all data is trained five times. In TensorFlow, model.fit is directly used for training, where epoch indicates the number of training iterations.

04 Deep Learning (Textbook)

169

Deep Learning Development Frameworks

Page 46

Figure 2-6 Training process As shown in Figure 2-7, you can test the model using the test set, compare predicted results with actual ones, and find correctly predicted labels, to calculate the accuracy of the test set.

Figure 2-7 Test and verification

2.5 Summary This chapter describes the common frameworks and features in the AI field, especially the module components and basic usage of TensorFlow. On this basis, a training code example is provided to introduce the application of framework functions and modules in the practical situation. You can set up the environment and run the sample project according to the instruction in this chapter. It is believed that after this process, you will have a deeper understanding of the AI field.

2.6 Quiz 1.

AI is widely used. What are the mainstream frameworks of AI? What are their features?

2.

As a typical AI framework, TensorFlow has a large number of users. During the maintenance of TensorFlow, the major change is that its version change from TensorFlow 1.0 to TensorFlow 2.0. Please describe the differences between the two versions.

3.

TensorFlow has many modules to meet users' actual needs. Please describe three common TensorFlow modules.

4.

Configure an AI development framework by following instructions in this chapter.

04 Deep Learning (Textbook)

170

Huawei AI Academy Training Materials

Deep Learning Open-Source Framework MindSpore

Huawei Technologies Co., Ltd.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

171

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees, or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express, or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang, Shenzhen 518129

Website:

https://e.huawei.com

05 Deep Learning Open-Source Framework MindSpore (Textbook)

172

Huawei MindSpore AI Development Framework

Page 1

Contents 5 Deep Learning Open-Source Framework MindSpore .............................................................. 2 5.1 MindSpore Development Framework ............................................................................................................................... 2 5.1.1 MindSpore Architecture ...................................................................................................................................................... 2 5.1.2 MindSpore Design Concept ............................................................................................................................................... 3 5.1.3 MindSpore Advantages .....................................................................................................................................................12 5.2 MindSpore Development and Application ....................................................................................................................13 5.2.1 Environment Setup .............................................................................................................................................................13 5.2.2 MindSpore Components and Concepts.......................................................................................................................15 5.2.3 Constraints on Network Construction Using Python Source Code ...................................................................17 5.2.4 Implementing an Image Classification Application ................................................................................................20 5.3 Summary ...................................................................................................................................................................................28 5.4 Quiz .............................................................................................................................................................................................28

05 Deep Learning Open-Source Framework MindSpore (Textbook)

173

Huawei MindSpore AI Development Framework

5

Page 2

Deep Learning Open-Source Framework MindSpore

This chapter describes Huawei AI development framework MindSpore, including the structure and design roadmap of MindSpore, features of MindSpore for resolving the problems and difficulties of the AI computing framework, and development and application of MindSpore.

5.1 MindSpore Development Framework MindSpore is a Huawei-developed AI computing framework that implements on-demand device-edge-cloud collaboration across all scenarios. It provides unified APIs for all scenarios and end-to-end capabilities for AI model development, running, and deployment. With the device-edge-cloud collaborative distributed architecture, MindSpore uses the new paradigm of differential native programming, and new execution mode of AI-Native to achieve better resource efficiency, security, and reliability. In addition, it lowers the AI development threshold in the industry, and releases the computing power of Ascend processors, contributing to inclusive AI.

5.1.1 MindSpore Architecture The MindSpore architecture consists of the development state, execution state, and deployment state. The processors that can be deployed include CPUs, GPUs, and Ascend processors (Ascend 310/Ascend 910), as shown in Figure 5-1.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

174

Huawei MindSpore AI Development Framework

Page 3

Figure 5-1 MindSpore architecture The development state provides unified APIs (Python APIs) for all scenarios, including unified model training, inference, and export APIs, as well as unified data processing, enhancement, and format conversion APIs. The development state also supports Graph High Level Optimization (GHLO), including hardware-independent optimization (such as dead code elimination), automatic parallelism, and automatic differentiation. These functions also support the design concept of unified APIs for all scenarios. MindSpore Intermediate Representation (IR) in the execution state has a native computational graph and provides a unified IR. MindSpore performs pass optimization based on the IR. The execution state includes hardware-related optimization, parallel pipeline execution layer, and in-depth optimization related to the combination of software and hardware such as operator fusion and buffer fusion. These features support automatic differentiation, automatic parallelism, and automatic optimization. The deployment state uses the device-edge-cloud collaborative distributed architecture with deployment, scheduling, and communication at the same layer, so it can implement on-demand collaboration across all scenarios. To put it simply, MindSpore integrates easy development (AI algorithm as code), efficient execution (supporting Ascend/GPU optimization), and flexible deployment (all-scenario on-demand collaboration).

5.1.2 MindSpore Design Concept To address the challenges faced by AI developers in the industry, such as high development threshold, high operation cost, and difficult deployment, MindSpore proposes three technical innovation points: new programming paradigm, new execution

05 Deep Learning Open-Source Framework MindSpore (Textbook)

175

Huawei MindSpore AI Development Framework

Page 4

mode, and new collaboration mode, to help developers develop and deploy AI applications simpler and more efficiently.

5.1.2.1 New Programming Paradigm The design concept of the new programming paradigm is put forward to deal with the challenges of the development state. For the development state, the challenges are as follows: 1.

High requirements for skills: Developers are required to understand AI, have theoretical knowledge related to computer systems and software, and have strong mathematical skills, so there is a high development threshold.

2.

Difficult tuning of the black box: It is difficult to optimize parameters because of the black box and unexplainable features of AI algorithms.

3.

Difficult parallel planning: With the current technology trend where the data volume and the model are larger and larger, parallel computing is inevitable, but parallel planning depends heavily on human experience. It requires the understanding of data, model and the distributed system architecture.

The concept "AI algorithm as code" of the new programming paradigm lowers the threshold for AI development. The new AI programming paradigm based on mathematical native expressions allows algorithm experts to focus on AI innovation and exploration, as shown in Figure 5-2.

Figure 5-2 New programming paradigm of MindSpore

5.1.2.2 Automatic Differentiation Technology The core of the AI framework and one of the decisive factors of a programming paradigm is the automatic differentiation technology used in the AI framework. The deep learning model is trained through forward and backward propagation. As shown in Figure 5-3, the forward propagation follows the direction of the black arrow, and the backward propagation follows the direction of the red arrow. The backward propagation is based on the chain rule of the composite function, as shown in Figure 5-4.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

176

Huawei MindSpore AI Development Framework

Page 5

Figure 5-3 Forward propagation and backward propagation

Figure 5-4 Chain rule Automatic differentiation is the soul of the deep learning framework, with which we only need to focus on forward propagation and leave all complex derivation and backward propagation processes to the framework. Automatic differentiation generally refers to the method of automatically calculating the derivative of a function. In machine learning, these derivatives can be used to update the weight. In the wider natural sciences, these derivatives can also be used for various subsequent calculations. Figure 5-5 shows the development history of automatic differentiation.

Figure 5-5 Development history of automatic differentiation

05 Deep Learning Open-Source Framework MindSpore (Textbook)

177

Huawei MindSpore AI Development Framework

Page 6

There are three automatic differentiation technologies in the mainstream deep learning framework at present: Conversion based on static computational graphs: The network is conversed into static data flow diagrams during compilation, and then the chain rule is applied to the data flow diagrams to implement automatic differentiation. For example, the static compilation technology can be used to optimize the network performance in TensorFlow, but the network setup or debugging is complex. Conversion based on dynamic computational graphs: The operator reloading mode is used to record the operation of the network in forward execution. Then, the chain rule is applied to the dynamically generated data flow diagrams, and implement automatic differentiation. For example, PyTorch is easy to use but difficult to achieve optimal performance. Conversion based on source code: Based on the functional programming framework, this technology source performs automatic differentiation transfer on IE (program expressions of in the compilation process) through just-in-time (JIT) compilation. It supports complex control flow scenarios, high-order functions, and closures. The automatic differentiation technology of MindSpore is based on source code conversion. It also supports automatic differentiation of automatic control flows, so it is easy to build models, same as PyTorch. In addition, MindSpore can perform static compilation optimization on the neural networks, so the performance is excellent. Table 5-1 compares automatic differentiation technologies and Figure 5-6 compares the performance and programmability.

Table 5-1 Comparison of automatic differentiation technology Automatic Differentiation Type Graph

General

Fast

Portable

Differentiable

Typical Framework

No





Partially

TensorFlow

OO



Partially

Partially



PyTorch

SCT









MindSpore

05 Deep Learning Open-Source Framework MindSpore (Textbook)

178

Huawei MindSpore AI Development Framework

Page 7

Figure 5-6 Performance and programmability comparison of automatic differentiation technology To put it simply, the automatic differentiation technology of MindSpore has the following advantages: 1.

In terms of programmability, the universal Python language is used, and it is based on the primitive differentiability of IR.

2.

In terms of performance, compilation is optimized, and inverse operators are automatically optimized.

3.

In terms of debugging, abundant visual interfaces are available, and dynamic execution is supported.

5.1.2.3 Automatic Parallelism Currently, deep learning models must be parallelized due to the large volume, and it is done manually. It requires model segmentation to be designed, and the cluster topology to be sensed, so it is difficult to develop, ensure and optimize performance. MindSpore automatic parallelism uses serial algorithm code to automatically implement distributed parallel training and maintain high performance. Generally, parallel training can be divided into model parallel training and data parallel training. It is easy to understand data parallel training, where each sample can independently complete forward propagation and then summarize the propagation result. In contrast, model parallel training is more complex, which requires developers to manually write all the parts that need to be parallelized with the logic of "parallel thinking". MindSpore provides a key innovative technology, that is, automatic graph segmentation. The entire graph is segmented based on the input and output data dimensions of the operator, that is, each operator in the graph is segmented to the clusters to complete parallel computing. Data parallelism and model parallelism are combined. Cluster topology awareness scheduling allows the cluster topology to be perceived, and automatic scheduling of subgraphs to be executed to minimize the communication overhead, as shown in Figure 5-7. MindSpore automatic parallelism aims to build a training mode that integrates data parallelism, model parallelism, and hybrid parallelism. It automatically selects a model segmentation mode with the minimum cost to implement automatic distributed parallel training.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

179

Huawei MindSpore AI Development Framework

Page 8

Figure 5-7 Automatic graph segmentation The fine-grained operator segmentation of MindSpore is complex. However, developers only need to use the top API for efficient computing, relieving of underlying implementation concerns. In general, the new programming paradigm not only implements "AI algorithm as code", but also lowers the threshold for AI development and enables efficient development and debugging. For example, the new programming paradigm can efficiently complete automatic differentiation, and achieve automatic parallelization and debug-mode switch with one line. A developer implements the classic algorithm transformer in natural language processing (NLP) field by using the MindSpore framework. During development and debugging, with the dynamic and static combined, the debugging process is transparent and simple. From the final structure, MindSpore has 2000 lines in the framework, the number of lines is about 20% less than that of 2500 lines in TensorFlow, but the efficiency is improved by over 50%.

5.1.2.4 New Execution Mode The design concept of the new execution mode is proposed to meet the challenges of the execution state. The challenges of the execution state are as follows: 1. AI computing complexity and computing power diversity: CPU core, Cube unit, and Vector unit, operations of scalars, vectors, and tensors, mixed precision operation, dense matrix and sparse matrix calculation. 2. When multiple cards are running, the performance cannot increase linearly as the number of nodes increases, and the parallel control overhead is high. The new execution mode uses the Ascend Native execution engine: On-Device execution is available, as shown in Figure 5-8. The mode offloads graphs to devices, and implements deep graph optimization, maximizing the computing power of Ascend.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

180

Huawei MindSpore AI Development Framework

Page 9

Figure 5-8 On-Device execution Two core technologies of On-Device execution are as follows: 1.

The graph sink execution maximizes the computing power of Ascend. Challenges to model execution under strong chip computing power include memory wall problem, high interaction overhead, and difficult data supply. Partial operations are performed on the host, while the others are performed on the device. The interaction overhead is much larger than the execution overhead, resulting in the low accelerator usage. MindSpore uses the chip-oriented deep graph optimization technology to minimize the synchronization waiting time, and maximize the parallelism of data, computing, and communication. It sinks the entire data and computational graphs to the Ascend chip to provide the best effect. The training performance elevates tenfold compared with the on-host graph scheduling.

2.

Massive distributed gradient aggregation is driven by data. Challenges to distributed gradient aggregation under strong chip computing power are the synchronization overhead of central control and frequent synchronization of ResNet50 under the single iteration of 20 ms. The traditional method can only complete All Reduce after three times of synchronization, while the data-driven method autonomously performs All Reduce without causing control overhead. MindSpore uses adaptive graph segmentation optimization driven by gradient data to implement decentralized All Reduce, consistent gradient aggregation, and full pipeline of computing and communication, as shown in Figure 5-9.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

181

Huawei MindSpore AI Development Framework

Page 10

Figure 5-9 Decentralized and autonomous All Reduce Figure 5-10 shows an example of computer vision. The neural network ResNet50 V1.5 is used for training based on ImageNet2012 dataset with the optimal batch size. It shows that the speed of the MindSpore framework based on Ascend 910 is much higher than that in other frameworks and other mainstream training cards. Therefore, the optimization technology of Huawei software and hardware collaboration can be used to implement efficient operation in the MindSpore framework.

Figure 5-10 Comparison between MindSpore and TensorFlow

5.1.2.5 New Collaboration Mode The design concept of the new collaboration mode targets the challenge to the deployment state. 

Varied requirements, objectives, and constraints exist in the application scenarios of device, edge, and cloud. For example, the mobile phones are expected to be lightweight, while the cloud may require higher precision.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

182

Huawei MindSpore AI Development Framework

Page 11



Different hardware has different precision and speeds, as shown in Figure 5-11.



The diversity of hardware architectures leads to the all-scenario deployment differences and performance uncertainties. The separation of training and inference leads to model isolation.

In the new mode, all-scenario on-demand collaboration can be implemented to obtain better resource efficiency and privacy protection, ensuring security and reliability. It can be developed once and deployed across devices. Models can be large or small and can be flexibly deployed, bringing consistent development experience. Three key technologies for the new collaboration mode in MindSpore are as follows: 

IR of the unified model adapts to upper-layer differences in different language scenarios. User-defined data structures are compatible, providing consistent deployment experience.



The underlying hardware of the framework is also developed by Huawei. The graph optimization technology based on software and hardware collaboration can shield scenario differences.



Device-cloud collaboration of Federal Meta Learning breaks the boundaries of device and cloud, and implements real-time update of the multi-device collaboration model. The ultimate effect of the three key technologies is that, in a unified architecture, the deployment performance of models in all scenarios is consistent, and the precision of personalized models is significantly improved, as shown in Figure 5-12.

Figure 5-11 Deployment challenge

05 Deep Learning Open-Source Framework MindSpore (Textbook)

183

Huawei MindSpore AI Development Framework

Page 12

Figure 5-12 On-Demand collaboration and consistent development The vision and value of MindSpore is to provide an AI computing platform that features efficient development, excellent performance, and flexible deployment, helping the industry lower the threshold of AI development, release the computing power of Ascend AI processors, and facilitate inclusive AI, as shown in Figure 5-13.

Figure 5-13 MindSpore vision and value

5.1.3 MindSpore Advantages 5.1.3.1 Easy Development 

Automatic differentiation: unified programming of network and operator, functional/algorithm native expression, and automatic generation of inverse network operators



Automatic parallelism: The automatic segmentation of models can achieve the optimal efficiency of model parallelism.



Automatic optimization. The same set of code is used for dynamic and static graphs.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

184

Huawei MindSpore AI Development Framework

Page 13

5.1.3.2 Efficient Execution 

On-Device execution leverages the great computing power of Ascend.



The pipeline is optimized to maximize the parallel linearity.



It implements deep graph optimization and adapts to the computing power and precision of the AI core.

5.1.3.3 Flexible Deployment 

Device-edge-cloud collaborative computing enables better protecting privacy.



Unified device-edge-cloud architecture implements one-time development and ondemand deployment.

MindSpore is equivalent to the open-source framework in the industry. Huaweideveloped chips and cloud services are prioritized. Upward: It can interconnect with third-party frameworks and third-party ecosystems (training frontend interconnection and inference model interconnection) through Graph IR. In addition, developers can be extended. Downward: It can interconnect with third-party chips, help developers increase MindSpore application scenarios, and expand the AI ecosystem.

5.2 MindSpore Development and Application 5.2.1 Environment Setup 5.2.1.1 Overall Installation Requirements Overall installation requirements: Ubuntu 16.04 (or later) and Python 3.7.5 (or later) are required, including the CPU, GPU, and Ascend environment. The installation methods include direct installation by using the installation package, source code compilation installation, and docker installation. The following example uses the CPU environment to describe the installation procedure. Table 5-2 lists the system requirements and software dependencies of the MindSpore CPU version.

Table 5-2 MindSpore requirements and software dependencies Version Operating System

MindSpore Master Ubuntu 16.04 (or later) x86_64 - Python 3.7.5

Executable File Installation Dependencies

Source Code Compilation and Installation Dependencies

05 Deep Learning Open-Source Framework MindSpore (Textbook)

- For details about other dependency items, see the requirements.txt. Compilation Dependencies: - Python 3.7.5 - wheel >= 0.32.0

185

Huawei MindSpore AI Development Framework

Version

Page 14

MindSpore Master - GCC 7.3.0 - CMake >= 3.14.1 - patch >= 2.5 - Autoconf >= 2.64 - Libtool >= 2.4.6 - Automake >= 1.15.1 Installation Dependencies: Same as the executable files installation dependencies.

5.2.1.2 Direct Installation Using the Pip Installation Package pip install –y MindSpore-cpu

5.2.1.3 Installation Using Source Code Compilation 1.

Download the source code from the code repository. git clone https://gitee.com/MindSpore/MindSpore.git

2.

Run the following command in the root directory of the source code to compile MindSpore. bash build.sh -e cpu -z -j4



Before running the preceding command, ensure that the paths where the executable files cmake and patch store have been added to the environment variable PATH.



In the build.sh script, the git clone command will be executed to obtain the code in the third-party dependency database. Ensure that the network settings of Git are correct.



If the compiler performance is good, add -j{Number of threads} to increase the number of threads. For example, bash build.sh -e cpu -z -j12.

3.

Run the following commands to install MindSpore: chmod +x build/package/MindSpore-{version}-cp37-cp37m-linux_{arch}.whl pip install build/package/MindSpore-{version}-cp37-cp37m-linux_{arch}.whl

4.

Run the following command. If no loading error message such as "No module named 'MindSpore'" is displayed, the installation is successful. python -c 'import MindSpore'

05 Deep Learning Open-Source Framework MindSpore (Textbook)

186

Huawei MindSpore AI Development Framework

Page 15

5.2.1.4 Docker Installation: docker pull MindSpore/MindSpore-cpu:0.1.0-alpha

5.2.2 MindSpore Components and Concepts 5.2.2.1 Components In MindSpore, data is also stored in tensors. Common tensor operations: asnumpy() size() dim() dtype() set_dtype() tensor_add(other: Tensor) tensor_mul(ohter: Tensor) shape() __Str__# (conversion into strings)

These tensor operations can be understood. For example, asnumpy() indicates that the tensor is converted into a NumPy array, and tensor_add() indicates that the tensor is added. Table 5-3 describes other components of MindSpore.

Table 5-3 MindSpore components and description Component model_zoo communication

Description Definition of common network models Data loading module, which defines the dataloader and dataset and processes data such as images and texts

dataset common context

Dataset processing module, which can read and preprocess data Defines tensor, parameter, dtype, and initializer. Defines the context class and sets model running parameters, such as graph and PyNative switching modes.

akg

Automatic differentiation and custom operator library

nn

Defines MindSpore cells (neural network units), loss functions, and optimizers.

ops

Defines basic operators and registers reverse operators.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

187

Huawei MindSpore AI Development Framework

Component

Page 16

Description

train

Training model and summary function modules

utils

Utilities, which verify parameters. This parameter is used in the framework.

5.2.2.2 Programming Concept: Operation Common operations in MindSpore: 

array: array-related operators -ExpandDims - Squeeze -Concat - OnesLike -Select - StridedSlice -ScatterNd …



math: math-related operators -AddN - Cos -Sub - Sin -Mul - LogicalAnd -MatMul - LogicalNot -RealDiv - Less -ReduceMean - Greater …



nn: network operators -Conv2D - MaxPool -Flatten - AvgPool -Softmax - TopK -ReLU - SoftmaxCrossEntropy -Sigmoid - SmoothL1Loss -Pooling- SGD -BatchNorm - SigmoidCrossEntropy …



control: control operators ControlDepend

5.2.2.3 Programming Concept: Cell 1.

The cell defines the basic module for calculation. The objects of the cell can be directly executed. ① __init__ initializes and verifies components such as parameter, cell, and primitive. ② construct defines the execution process. In graph mode, a graph is compiled for execution, and is subject to specific syntax restrictions.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

188

Huawei MindSpore AI Development Framework

Page 17

③ bprop (optional) indicates the reverse direction of customized modules. If this function is undefined, automatic differentiation is used to calculate the reverse of the construct part. 2.

The cells predefined in MindSpore mainly include: common loss (SoftmaxCrossEntropyWithLogits and MSELoss), common optimizers (Momentum, SGD, and Adam), and common network packaging functions, such as TrainOneStepCell network gradient calculation and update, WithGradCell gradient calculation.

5.2.2.4 Programming Concept: MindSpore IR 1.

MindSpore IR (MindIR) is a compact, efficient, and flexible graph-based functional IR that can represent functional semantics such as free variables, high-order functions, and recursion. It is a program carrier in the process of AD and compilation optimization.

2.

Each graph represents a function definition graph and consists of ParameterNode, ValueNode, and ComplexNode (CNode).

3.

The edge shows the def-use relationship.

5.2.3 Constraints on Network Construction Using Python Source Code MindSpore can compile user source code based on the Python syntax into computational graphs, and can convert common functions or instances inherited from nn.Cell into computational graphs. Currently, MindSpore does not support conversion of any Python source code into computational graphs. Therefore, there are constraints on source code compilation, including syntax constraints and network definition constraints. As MindSpore evolves, the constraints may change. These constraints may change as MindSpore evolves.

5.2.3.1 Syntax Constraints 1.

Supported Python data types ① Number: The value can be int, float, or bool. Complex numbers are not supported. ② String ③ List: Currently, only the append method is supported. Updating a list will generate a new list. ④ Tuple ⑤ Dictionary: The type of key only supports String.

2.

MindSpore extended data types Tensor: The tensor variables must be defined instances.

3.

Function parameters ① Default parameter value: Currently, data types int, float, bool, None, str, tuple, list and dict are supported, whereas Tensor is not supported. ② Variable parameter: Currently, functions with variable parameters cannot be used for backward propagation.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

189

Huawei MindSpore AI Development Framework

Page 18

③ Key-value pair parameter: Currently, functions with key-value pair parameters cannot be used for backward propagation. ④ Variable key-value pair parameter: Currently, the function with variable key-value pairs cannot be reversed. 4.

Statement types, as shown in Table 5-4.

Table 5-4 MindSpore and Python statement comparison Statement

Compared with Python

for

Nested for loops are partially supported. Iteration sequences must be tuples or list.

while if def Assignment statement

5.

Nested while loops are partially supported. Same as that in Python. The input of the if condition must be a constant. Same as that in Python. Accessed multiple subscripts of lists and dictionaries cannot be used as left values.

Operators, as shown in Table 5-5.

Table 5-5 Supported types of MindSpore operators Operator

6.

Supported Type

+

Scalar, Tensor, tuple

-

Scalar and Tensor

*

Scalar and Tensor

/

Scalar and Tensor

[]

The operation object type can be list, tuple, or Tensor. Accessed multiple subscripts can be used as the right values instead of left values. The index type cannot be Tensor. For details about the access constraints for the Tuple and Tensor types, see the description in the slicing operations.

Unsupported syntax

Currently, the following syntax is not supported in network constructors: break, continue, pass, raise, yield, async for, with, async with, assert, import and await.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

190

Huawei MindSpore AI Development Framework

Page 19

5.2.3.2 Network Definition Constraints 1.

Instance types on the entire network ① Common Python function with the @ms_function decorator ② Cell subclass inherited from nn.Cell.

2.

Network input types ① The training data input parameters of the entire network must be of the Tensor type. ② The generated ANF diagram cannot contain the following constant nodes: string constants, constants with nested tuples, and constants with nested lists.

3.

Network graph optimization During graph optimization at the ME frontend, the dataclass, dictionary, list, and key-value pair types are converted to tuple types, and the corresponding operations are converted to tuple operations.

4.

Network construction components, as shown in Table 5-6.

Table 5-6 Constraints on network construction components Category Cell instance

Content MindSpore/nn/* and customized Cell

Member function of a Cell instance

Member functions of other classes in the construct function of Cell can be called.

Function

Custom Python functions and system functions listed in the preceding content.

Dataclass instance

Class decorated with @dataclass

Primitive operator

MindSpore/ops/operations/*

Composite operator

MindSpore/ops/composite/*

Operator generated by constexpr

Use the value generated by @constexpr to calculate operators.

5.2.3.3 Other Constraints The input parameters of the construct function on the entire network and the parameters of the function modified by the ms_function decorator are generalized during graph compilation and cannot be passed to the operator as constant input. For example, the incorrect input is as follows: class ExpandDimsTest(Cell): def __init__(self): super(ExpandDimsTest, self).__init__() self.expandDims = P.ExpandDims() def construct(self, input_x, input_axis): return self.expandDims(input_x, input_axis)

05 Deep Learning Open-Source Framework MindSpore (Textbook)

191

Huawei MindSpore AI Development Framework

Page 20

expand_dim = ExpandDimsTest() input_x = Tensor(np.random.randn(2,2,2,2).astype(np.float32)) expand_dim(input_x, 0)

In the example, ExpandDimsTest is a single-operator network with two inputs: input_x and input_axis. The second input of the ExpandDims operator must be a constant. This is because input_axis is required when the output dimension of the ExpandDims operator is deduced during graph compilation. However, input_axis, as a network parameter input, is generalized into a variable and its value cannot be determined. As a result, the output dimension of the operator cannot be deduced, causing the graph compilation failure. Therefore, the input required by deduction in the graph compilation phase must be a constant. In the API, the parameters of this type of operator that require constant input will be explained, marked const input is needed. The correct way is to directly enter the required value or a member variable in a class for the constant input of the operator in the construct function, as shown in the following example: class ExpandDimsTest(Cell): def __init__(self, axis): super(ExpandDimsTest, self).__init__() self.expandDims = P.ExpandDims() self.axis = axis def construct(self, input_x): return self.expandDims(input_x, self.axis) axis = 0 expand_dim = ExpandDimsTest(axis) input_x = Tensor(np.random.randn(2,2,2,2).astype(np.float32)) expand_dim(input_x)

5.2.4 Implementing an Image Classification Application 5.2.4.1 Overview This document uses a practice example to demonstrate the basic functions of MindSpore. For common users, it takes 20 to 30 minutes to complete the practice. This is a simple and basic application process. For other advanced and complex applications, extend this basic process as needed. You can download the complete executable sample code for experiment learning. The link is as follows: https://gitee.com/mindspore/docs/blob/master/tutorials/tutorial_code/lenet.py During the practice, a simple image classification function is implemented. The overall process is as follows: 1.

Load the required dataset. The MNIST dataset is used in this example.

2.

Define a network. The LeNet network is used in this example.

3.

Define the loss function and optimizer.

4.

Load the dataset and perform training. After the training is complete, view the result and save the model file.

5.

Load the saved model for inference.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

192

Huawei MindSpore AI Development Framework

6.

Page 21

Validate the model, load the test dataset and trained model, and verify the result precision.

5.2.4.2 Preparation Before you start, check whether MindSpore has been correctly installed. If MindSpore is not installed, install it by referring to 5.2.1 Environment Setup. In addition, you shall have basic mathematical knowledge such as Python coding basics, probability, and matrix. Now, let's start the MindSpore experience. Step 1

Download a dataset. The MNIST dataset used in this example consists of 10 types of 28 x 28 pixels grayscale images. It has a training set of 60,000 examples, and a test set of 10,000 examples. Download the MNIST dataset at http://yann.lecun.com/exdb/mnist/. This page provides four download links of dataset files. The first two links are required for data training, and the last two links are required for data test. Download the files, decompress them, and store them in the workspace directories ./MNIST_Data/train and ./MNIST_Data/test. The directory is as follows: └─MNIST_Data ├─test │ t10k-images.idx3-ubyte │ t10k-labels.idx1-ubyte │ └─train train-images.idx3-ubyte train-labels.idx1-ubyte

To facilitate the use of the sample, the function of automatically downloading dataset is added to the sample script. Step 2

Import Python libraries and modules. Before start, you need to import Python libraries. Currently, the os library is used. For ease of understanding, other required libraries will be introduced in detail when being used.

import os Step 3

Configure the running information. Before compiling code, you need to learn basic information about the hardware and backend required for MindSpore running. You can use context.set_context() to configure the information required for running, such as the running mode, backend information, and hardware information. Import the context module and configure the required information. import argparse from MindSpore import context

05 Deep Learning Open-Source Framework MindSpore (Textbook)

193

Huawei MindSpore AI Development Framework

Page 22

if __name__ == "__main__": parser = argparse.ArgumentParser(description='MindSpore LeNet Example') parser.add_argument('--device_target', type=str, default="Ascend", choices=['Ascend', 'GPU', 'CPU'], help='device where the code will be implemented (default: Ascend)') args = parser.parse_args() context.set_context(mode=context.GRAPH_MODE, device_target=args.device_target, enable_mem_reuse=False)

... The sample is configured to use the graph mode for running. Configure the hardware information based on the site requirements. For example, if the code runs on the Ascend AI processor, set --device_target to Ascend. If the code runs on the CPU or GPU, set -device_target accordingly. For details about parameters, see the API description for context.set_context(). ----End

5.2.4.3 Data Preprocessing Datasets are important for training. A good dataset can effectively improve training accuracy and efficiency. Generally, before loading a dataset, you need to perform some operations on the dataset. Define the dataset and data operations. Define the create_dataset() function to create a dataset. In this function, define the data augmentation and processing operations to be performed: 1.

Define the dataset.

2.

Define parameters required for data augmentation and processing.

3.

Generate corresponding data augmentation operations according to the parameters.

4.

Use the map() mapping function to apply data operations to the dataset.

5.

Process the generated dataset.

import MindSpore.dataset as ds import MindSpore.dataset.transforms.c_transforms as C import MindSpore.dataset.transforms.vision.c_transforms as CV from MindSpore.dataset.transforms.vision import Inter from MindSpore.common import dtype as mstype def create_dataset(data_path, batch_size=32, repeat_size=1, num_parallel_workers=1): """ create dataset for train or test Args: data_path: Data path batch_size: The number of data records in each group repeat_size: The number of replicated data records num_parallel_workers: The number of parallel workers """ # define dataset mnist_ds = ds.MnistDataset(data_path) # define operation parameters

05 Deep Learning Open-Source Framework MindSpore (Textbook)

194

Huawei MindSpore AI Development Framework

Page 23

resize_height, resize_width = 32, 32 rescale = 1.0 / 255.0 shift = 0.0 rescale_nml = 1 / 0.3081 shift_nml = -1 * 0.1307 / 0.3081 # define map operations resize_op = CV.Resize((resize_height, resize_width), interpolation=Inter.LINEAR) # resize images to (32, 32) rescale_nml_op = CV.Rescale(rescale_nml, shift_nml) # normalize images rescale_op = CV.Rescale(rescale, shift) # rescale images hwc2chw_op = CV.HWC2CHW() # change shape from (height, width, channel) to (channel, height, width) to fit network. type_cast_op = C.TypeCast(mstype.int32) # change data type of label to int32 to fit network # apply map operations on images mnist_ds = mnist_ds.map(input_columns="label", operations=type_cast_op, num_parallel_workers=num_parallel_workers) mnist_ds = mnist_ds.map(input_columns="image", operations=resize_op, num_parallel_workers=num_parallel_workers) mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_op, num_parallel_workers=num_parallel_workers) mnist_ds = mnist_ds.map(input_columns="image", operations=rescale_nml_op, num_parallel_workers=num_parallel_workers) mnist_ds = mnist_ds.map(input_columns="image", operations=hwc2chw_op, num_parallel_workers=num_parallel_workers) # apply DatasetOps buffer_size = 10000 mnist_ds = mnist_ds.shuffle(buffer_size=buffer_size) # 10000 as in LeNet train script mnist_ds = mnist_ds.batch(batch_size, drop_remainder=True) mnist_ds = mnist_ds.repeat(repeat_size) return mnist_ds

where batch_size: indicates the number of data records in each group. Currently, each group contains 32 data records. repeat_size: indicates the number of replicated data records. Generally, perform the shuffle and batch operations, and then perform the repeat operation to ensure that data during an epoch is unique. MindSpore supports multiple data processing and enhancing operations, which are usually used together. For details, see section "Data Processing and Data Enhancement".

5.2.4.4 Defining the Network The LeNet network is relatively simple. In addition to the input layer, the LeNet network has seven layers, including two convolutional layers, two down-sampling layers (pooling layers), and three fully connected layers. Each layer contains different numbers of training parameters, as shown in Figure 5-14:

05 Deep Learning Open-Source Framework MindSpore (Textbook)

195

Huawei MindSpore AI Development Framework

Page 24

Figure 5-14 LeNet-5 structure You need to initialize the full connection layers and convolutional layers. TruncatedNormal: parameter initialization method. MindSpore supports multiple parameter initialization methods, such as TruncatedNormal, Normal, and Uniform. For details, see the description of the mindspore.common.initializer module of MindSpore API. The following is the sample code for initialization: import MindSpore.nn as nn from MindSpore.common.initializer import TruncatedNormal def weight_variable(): """ weight initial """ return TruncatedNormal(0.02) def conv(in_channels, out_channels, kernel_size, stride=1, padding=0): """ conv layer weight initial """ weight = weight_variable() return nn.Conv2d(in_channels, out_channels, _size=kernel_size, stride=stride, padding=padding, weight_init=weight, has_bias=False, pad_mode="valid") def fc_with_initialize(input_channels, out_channels): """ fc layer weight initial """ weight = weight_variable() bias = weight_variable() return nn.Dense(input_channels, out_channels, weight, bias)

To use MindSpore for neural network definition, inherit mindspore.nn.cell.Cell. Cell is the base class of all neural networks such as Conv2d. Define each layer of a neural network in the __init__() method in advance, and then define the construct() method to complete the forward construction of the neural network. According to the structure of the LeNet network, define the network layers as follows: class LeNet5(nn.Cell):

05 Deep Learning Open-Source Framework MindSpore (Textbook)

196

Huawei MindSpore AI Development Framework

Page 25

""" Lenet network structure """ #define the operator required def __init__(self): super(LeNet5, self).__init__() self.batch_size = 32 self.conv1 = conv(1, 6, 5) self.conv2 = conv(6, 16, 5) self.fc1 = fc_with_initialize(16 * 5 * 5, 120) self.fc2 = fc_with_initialize(120, 84) self.fc3 = fc_with_initialize(84, 10) self.relu = nn.ReLU() self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2) self.flatten = nn.Flatten() #use the preceding operators to construct networks def construct(self, x): x = self.conv1(x) x = self.relu(x) x = self.max_pool2d(x) x = self.conv2(x) x = self.relu(x) x = self.max_pool2d(x) x = self.flatten(x) x = self.fc1(x) x = self.relu(x) x = self.fc2(x) x = self.relu(x) x = self.fc3(x) return x

5.2.4.5 Defining the Loss Function and Optimizer 

Basic concepts Before definition, this section briefly describes concepts of loss function and optimizer. Loss function: It is also called objective function and is used to measure the difference between a predicted value and an actual value. Deep learning reduces the value of the loss function by continuous iteration. Defining a good loss function can effectively improve the model performance. Optimizer: It is used to minimize the loss function, improving the model during training. After the loss function is defined, the weight-related gradient of the loss function can be obtained. The gradient is used to indicate the weight optimization direction for the optimizer, improving model performance.



Define the loss function. Loss functions supported by MindSpore include SoftmaxCrossEntropyWithLogits, L1Loss, MSELoss, and NLLLoss. The SoftmaxCrossEntropyWithLogits loss function is used. from MindSpore.nn.loss import SoftmaxCrossEntropyWithLogits

05 Deep Learning Open-Source Framework MindSpore (Textbook)

197

Huawei MindSpore AI Development Framework

Page 26

Call the defined loss function in the __main__ function: if __name__ == "__main__": ... #define the loss function net_loss = SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True, reduction='mean') ... 

Define the optimizer. Optimizers supported by MindSpore include Adam, AdamWeightDecay, StepLRPolicy, and Momentum. The popular Momentum optimizer is used in this example. if __name__ == "__main__": ... #learning rate setting lr = 0.01 momentum = 0.9 #create the network network = LeNet5() #define the optimizer net_opt = nn.Momentum(network.trainable_params(), lr, momentum) ...

5.2.4.6 Running Rules and Viewing Results Run the following command to run the initScript.sh script: python lenet.py --device_target=CPU

where lenet.py: indicates the script file that you write according to the tutorial. --device_target CPU: specifies the running hardware platform. The parameter can be CPU, GPU, or Ascend. You can specify the hardware platform based on the actual running hardware platform. Loss values are output during training, as shown in the following figure. Although loss values may fluctuate, they gradually decrease and the accuracy gradually increases in general. Loss values displayed each time may be different because of their randomicity. The following is an example of loss printing during training: epoch: epoch: epoch: epoch: epoch: epoch: epoch: epoch: epoch: epoch: ...

1 1 1 1 1 1 1 1 1 1

step: step: step: step: step: step: step: step: step: step:

262, 263, 264, 265, 266, 267, 268, 269, 270, 271,

loss loss loss loss loss loss loss loss loss loss

is is is is is is is is is is

1.9212162 1.8498616 1.7990671 1.9492403 2.0305142 2.0657792 1.9582214 0.9459006 0.8167224 0.7432692

The following is an example of model files saved after training:

05 Deep Learning Open-Source Framework MindSpore (Textbook)

198

Huawei MindSpore AI Development Framework

Page 27

checkpoint_lenet-1_1875.ckpt

where checkpoint_lenet-1_1875.ckpt: is the saved model parameter file. The file name format is checkpoint_{network name}-{epoch No.}_{step No.}.ckpt.

5.2.4.7 Model Verification After the model file is obtained, the result obtained by running the test data set by the model is used to verify the generalization capability of the model. Use the model.eval() interface to read the test data set. Use the saved model parameters for inference. from MindSpore.train.serialization import load_checkpoint, load_param_into_net ... def test_net(args,network,model,mnist_path): """define the evaluation method""" print("============== Starting Testing ==============") #load the saved model for evaluation param_dict = load_checkpoint("checkpoint_lenet-1_1875.ckpt") #load parameter to the network load_param_into_net(network, param_dict) #load testing dataset ds_eval = create_dataset(os.path.join(mnist_path, "test")) acc = model.eval(ds_eval, dataset_sink_mode=False) print("=========== Accuracy:{}=========".format(acc)) if __name__ == "__main__": ... test_net(args, network, model, mnist_path)

where load_checkpoint(): This API is used to load the checkpoint model parameter file and return a parameter dictionary. checkpoint_lenet-1_1875.ckpt: indicates the name of the saved checkpoint model file. load_param_into_net: This API is used to load parameters to the network. Use the run command to run your code script. python lenet.py --device_target=CPU where lenet.py: indicates the script file that you write according to the tutorial. --device_target CPU: specifies the running hardware platform. The parameter can be CPU, GPU, or Ascend. You can specify the hardware platform based on the actual running hardware platform. Command output similar to the following is displayed: ============== Starting Testing ============== ========== Accuracy:{'Accuracy':0.9742588141025641} ===========

05 Deep Learning Open-Source Framework MindSpore (Textbook)

199

Huawei MindSpore AI Development Framework

Page 28

The model accuracy data is displayed in the output content. In the example, the accuracy reaches 97.4%, indicating a good model quality.

5.3 Summary This section describes the Huawei-developed deep learning framework MindSpore. Three technological innovations of the MindSpore design concept are first introduced: new programming paradigm, new execution mode, and new collaboration mode, as well as advantages such as easy development, efficient execution state, and flexible deployment state. In the last section, the development and application of MindSpore are introduced, and an actual example of image classification is used to illustrate the development procedure.

5.4 Quiz 1.

MindSpore is a Huawei-developed AI computing framework that implements deviceedge-cloud on-demand collaboration across all scenarios. It provides unified APIs for all scenarios and provides end-to-end capabilities for AI model development, execution, and deployment in all scenarios. What are the main features of the MindSpore architecture?

2.

To address the challenges faced by AI developers in the industry, such as high development threshold, high operating costs, and difficult deployment. What are the three technological innovations proposed by MindSpore to help developers develop and deploy AI applications more easily and more efficiently?

3.

Challenges to model execution under strong chip computing power include memory wall problem, high interaction overhead, and difficult data supply. Some operations are performed on the host, and some are performed on the device, so the interaction overhead is much greater than the execution overhead, leading to the low accelerator usage. What is the solution of MindSpore?

4.

Use MindSpore to recognize MNIST handwritten digits.

05 Deep Learning Open-Source Framework MindSpore (Textbook)

200

Huawei AI Academy Training Materials

AI Computing Platform Atlas

Huawei Technologies Co., Ltd.

06 AI Computing Platform Atlas (Textbook)

201

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees, or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express, or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang, Shenzhen 518129

Website:

https://e.huawei.com

06 AI Computing Platform Atlas (Textbook)

202

Huawei Atlas Computing Platform

Page 1

Contents 6 AI Computing Platform Atlas ......................................................................................................... 2 6.1 Hardware Architecture of Ascend AI Processors........................................................................................................... 2 6.1.1 Logical Architecture of Ascend AI Processors ............................................................................................................. 2 6.1.2 Da Vinci Architecture ........................................................................................................................................................... 2 6.2 Software Architecture of Ascend AI Processors ............................................................................................................ 6 6.2.1 Logical Architecture of the Ascend AI Processor Software .................................................................................... 6 6.2.2 Neural Network Software Flow of Ascend AI Processors ...................................................................................... 9 6.2.3 Functional Modules of the Ascend AI Processor Software Stack ......................................................................11 6.2.4 Data Flowchart of the Ascend AI Processor ..............................................................................................................31 6.3 Atlas AI Computing Platform .............................................................................................................................................32 6.3.1 Overview of the Atlas AI Computing Platform ........................................................................................................32 6.3.2 Atlas Accelerates AI Inference ........................................................................................................................................32 6.3.3 Atlas Accelerates AI Training ..........................................................................................................................................47 6.3.4 Device-Edge-Cloud Collaboration Enables the Ultimate Development and User Experience ................50 6.4 Industry Applications of Atlas ............................................................................................................................................51 6.4.1 Electric Power: One-Stop ICT Solutions for Smart Grids ......................................................................................51 6.4.2 Smart Finance: Comprehensive Digital Transformation .......................................................................................51 6.4.3 Smart Manufacturing: Digital Integration of Machines and Thoughts ..........................................................52 6.4.4 Smart Transportation: Convenient Travel and Smooth Logistics ......................................................................53 6.4.5 Supercomputing: Building a National AI Platform .................................................................................................54 6.5 Summary ...................................................................................................................................................................................54 6.6 Quiz .............................................................................................................................................................................................54

06 AI Computing Platform Atlas (Textbook)

203

Huawei Atlas Computing Platform

6

Page 2

AI Computing Platform Atlas

This chapter describes the hardware and software architectures of Huawei Ascend AI Processors and provides full-stack all-scenario AI solutions based on Huawei Atlas AI computing platform.

6.1 Hardware Architecture of Ascend AI Processors 6.1.1 Logical Architecture of Ascend AI Processors The logical architecture of the Ascend AI Processor consists of four modules: control CPU, AI computing engine (including AI Core and AI CPU), multi-layer system-on-chip (SOC) caches or buffers, and digital vision pre-processing (DVPP) module. Figure 6-1 shows the logical architecture of Ascend AI Processors.

Figure 6-1 Logical architecture of Ascend AI Processors

6.1.2 Da Vinci Architecture 6.1.2.1 Da Vinci Architecture Overview The Da Vinci architecture, which is specially developed to improve the AI computing power, serves as the core of the Ascend AI computing engine and AI processor. It consists of three parts: computing unit, storage system, and control unit. 1.

Computing unit: It consists of the cube unit, vector unit, and scalar unit.

06 AI Computing Platform Atlas (Textbook)

204

Huawei Atlas Computing Platform

Page 3

2.

Storage system: It consists of the on-chip storage unit of the AI Core and the corresponding data channels.

3.

Control unit: It provides instruction control for the entire computing process. It serves as the command center of the AI Core and is responsible for the running of the entire AI Core.

Figure 6-2 shows the overall Da Vinci architecture.

Figure 6-2 Da Vinci architecture

6.1.2.2 Da Vinci Architecture (AI Core) — Computing Unit Three types of basic computing resources are available in the Da Vinci architecture: cube, vector, and scalar units, which correspond to cube, vector and scalar computing modes respectively. Figure 6-3 shows the computing unit in the Da Vinci architecture.

Figure 6-3 Computing unit in the Da Vinci architecture

06 AI Computing Platform Atlas (Textbook)

205

Huawei Atlas Computing Platform

Page 4

Cube unit: The cube unit and accumulator are used to perform matrix-related operations. It completes a matrix (4096) of 16x16 multiplied by 16x16 for FP16, or a matrix (8192) of 16x32 multiplied by 32x16 for the INT8 input in a shot. Vector unit: Implements computing between vectors and scalars or between vectors. This function covers basic computing types and many customized computing types, including computing of data types such as FP16, FP32, INT32, and INT8. Scalar unit: Equivalent to a micro CPU, the scalar unit controls the running of the entire AI Core. It implements loop control and branch judgment for the entire program, and provides the computing of data addresses and related parameters for cubes or vectors as well as basic arithmetic operations.

6.1.2.3 Da Vinci Architecture (AI Core) — Storage System The storage system of the AI Core is composed of the storage unit and the corresponding data channels, as shown in Figure 6-4.

Figure 6-4 Storage system in the Da Vinci architecture 1.

The storage system consists of the storage control unit, buffer, and registers. 1)

Storage control unit: The cache at a lower level than the AI Core can be directly accessed through the bus interface. The memory can also be directly accessed through the DDR or HBM. A memory migration unit is set as a transmission controller of the internal data channels of the AI Core to implement read/write management of internal data of the AI Core between different buffers. It also completes a series of format conversion operations, such as padding, Img2Col, transposing, and decompression.

2)

Input buffer: The buffer temporarily stores the data that needs to be frequently used so the data does not need to be read from the AI Core through the bus interface each time. This mode reduces the frequency of data access on the bus and the risk of bus congestion, thereby reducing power consumption and improving performance.

3)

Output buffer: The buffer stores the intermediate results of computing at each layer in the neural network, so that the data can be easily obtained for next-

06 AI Computing Platform Atlas (Textbook)

206

Huawei Atlas Computing Platform

Page 5

layer computing. Reading data through the bus involves low bandwidth and long latency, whereas using the output buffer greatly improves the computing efficiency. 4) 2.

Register: Various registers in the AI Core are mainly used by the scalar unit.

Data channel: path for data flowing in the AI Core during execution of computing tasks. Data channels in the Da Vinci architecture are characterized with multiple-input single-output. Considering that there are various data types and a large quantity of input data in the computing process on the neural network, concurrent inputs can be used to improve data inflow efficiency. On the contrary, only an output feature matrix is generated after multiple types of input data are processed. The data channel with a single output of data reduces the use of chip hardware resources.

6.1.2.4 Da Vinci Architecture (AI Core) — Control Unit The control units of AI Core include System Control, Scalar PSQ, and Instr. Dispatch, Cube Queue, Vector Queue, MTE Queue, and Event Sync. Figure 6-5 shows the control unit in the Da Vinci architecture.

Figure 6-5 Control unit in the Da Vinci architecture 1.

System control module: Controls the execution process of a task block (minimum task computing granularity for the AI Core). After the task block is executed, the system control module processes the interruption and reports the status. If an error occurs during the execution, the error status is reported to the task scheduler.

2.

Instruction cache: Prefetches subsequent instructions in advance during instruction execution and reads multiple instructions into the cache at a time, improving the instruction execution efficiency.

3.

Scalar instruction procession queue: After being decoded, the instructions are imported into a scalar queue to implement address decoding and operation control. The instructions include matrix computing instructions, vector calculation instructions, and storage conversion instructions.

06 AI Computing Platform Atlas (Textbook)

207

Huawei Atlas Computing Platform

Page 6

4.

Instruction transmitting module: Reads the configured instruction addresses and decoded parameters in the scalar instruction queue, and sends them to the corresponding instruction execution queue according to the instruction type. The scalar instructions reside in the scalar instruction processing queue for subsequent execution.

5.

Instruction execution queue: consists of a matrix operation queue, a vector operation queue, and a storage conversion queue. Different instructions are arranged in the corresponding operation queues and executed according to their sequence in queues.

6.

Event synchronization module: Controls the execution status of each instruction pipeline in real time, and analyzes dependence relationships between different pipelines to resolve problems of data dependence and synchronization between instruction pipelines.

6.2 Software Architecture of Ascend AI Processors 6.2.1 Logical Architecture of the Ascend AI Processor Software 6.2.1.1 Overview of the Logical Architecture of Ascend AI Processor Software The software stack of the Ascend AI Processors consists of four layers and an auxiliary toolchain. The four layers are the application enabling layer (L3), execution framework layer (L2), chip enabling layer (L1), and computing resource layer (L0). The toolchain provides auxiliary capabilities such as program development, compilation and commissioning, application process orchestration, log management, and profiling. The functions of the main components depend on each other in the software stack. They carry data flows, computing flows, and control flows. Figure 6-6 shows the logical architecture of the Ascend AI Processor software.

Figure 6-6 Logical architecture of the Ascend AI Processor software

06 AI Computing Platform Atlas (Textbook)

208

Huawei Atlas Computing Platform

Page 7

6.2.1.2 Application Enabling Layer (L3) L3 application enabling layer: It is an application-level encapsulation layer that provides different processing algorithms for specific application fields. L3 provides various fields with computing and processing engines. It can directly use the framework scheduling capability provided by L2 to generate the corresponding neural networks and implement specific engine functions. This layer provides various engines such as the computer vision engine, language and text engine, and generic service execution engine. 1.

The computer vision engine encapsulates video and image processing algorithms for applications in the computer vision field.

2.

The language and text engine provides language and text processing functions for specific application scenarios by encapsulating basic processing algorithms of voice and text data.

3.

The generic service execution engine provides the generic neural network inference capability.

6.2.1.3 Execution Framework Layer (L2) L2 execution framework layer: encapsulates the framework calling capability and offline model generation capability. After the application algorithm is developed and encapsulated into an engine at L3, L2 calls the appropriate deep learning framework, such as Caffe or TensorFlow, based on the features of the algorithm to obtain the neural network of the corresponding function, and generates an offline model through the framework manager (Framework). The L2 execution framework layer contains a framework manager and a process orchestrator (Matrix). 1.

Made up by the offline model generator (OMG), offline model executor (OME), and APIs for offline model inference, the framework manager supports model generation, loading, unloading, inference, computing, and execution. Online framework: uses a mainstream deep learning open source framework (such as Caffe and TensorFlow). It can perform accelerated computing on the Ascend AI Processors through offline model conversion and loading. Offline framework: provides the offline generation and execution capabilities of the neural network, which enables the offline model to have the same capabilities (mainly the inference capability) without using the deep learning framework, such as Caffe and TensorFlow.

2.

1)

OMG: converts the model files generated in the Caffe or TensorFlow framework into offline model files, which can be independently executed on the Ascend AI Processor.

2)

OME: loads and unloads offline models, converts successfully loaded model files into instruction sequences that can be executed on the Ascend AI Processor, and completes program compilation before execution.

Process orchestrator: provides developers with a development platform for deep learning computing, including computing resources, running framework, and related tools. It enables developers to efficiently compile AI applications that run on specified

06 AI Computing Platform Atlas (Textbook)

209

Huawei Atlas Computing Platform

Page 8

hardware devices. It is responsible for model generation, loading, and operation scheduling. After L2 converts the original neural network model into an offline model that can be executed on Ascend AI Processors, the OME transfers the offline model to Layer 1 for task allocation.

6.2.1.4 Chip Enabling Layer (L1) The L1 chip enabling layer bridges offline models to Ascend AI Processors. After receiving an offline model generated by L2, L1 speeds up offline model computing using acceleration libraries for various computing tasks. Nearest to the bottom-layer computing resources, L1 is responsible for outputting operator-layer tasks to the hardware. It mainly includes the DVPP, tensor boost engine (TBE), Runtime, driver, and Task Scheduler (TS) modules. L1 uses the TBE of the processor as the core. The TBE supports accelerated computing of online and offline models by using the standard operator acceleration library and custom operator capabilities. TBE contains a standard operator acceleration library that provides high-performance optimized operators. Operators interact with Runtime during execution. Runtime also communicates with L2 and provides standard operator acceleration library APIs for calling, enabling network models to use optimized, executable, and acceleration-capable operators for optimal performance. If the standard operator acceleration library at L1 does not contain the operators required by L2, you can customize them using TBE. TS, located below TBE, generates kernels based on operators, processes the kernels, and distributes them to AI CPU or the AI Core according to specific task types. The kernels are activated by the driver and executed on hardware. TS itself runs on a dedicated CPU core. DVPP module: functions as a multifunctional package body in image and video processing. It provides the upper layer with various data (image or video) preprocessing capabilities using dedicated hardware at the bottom layer.

6.2.1.5 Computing Resource Layer (L0) The L0 computing resource layer provides computing resources and executes specific computing tasks. It is the hardware computing basis of Ascend AI Processors. After the task corresponding to an operator is distributed at the L1 chip enabling layer, the execution of the task is initiated from the L0 computing resource layer. This layer consists of the operating system, AI CPU, AI Core, and DVPP-dedicated hardware modules. The AI Core is the computing core of the Ascend AI Processor and executes matrix-related computing tasks of the neural network. AI CPU is responsible for general computations of control operators, scalars, and vectors. If input data needs to be preprocessed, the DVPPdedicated hardware module is activated to preprocess the input image and video data. It also converts data to a specific format in compliance with AI Core requirements if needed. The AI Core executes computing tasks at large computing power. The AI CPU provides complex computing and execution control functions. The DVPP hardware preprocesses input data. The operating system collaborates between the preceding three roles to form

06 AI Computing Platform Atlas (Textbook)

210

Huawei Atlas Computing Platform

Page 9

a complete hardware system, ensuring the successful execution of the deep neural network computing for the Ascend AI Processor.

6.2.1.6 Toolchain The toolchain is a tool platform that facilitates programmers' development based on the Ascend AI Processor. It provides support for the development and debugging of custom operators and the network porting, tuning, and analysis. In addition, a set of desktop programming services is provided on the programming GUI, which significantly simplifies the development of application based on the deep neural network. The toolchain provides diverse tools such as project management and compilation, process orchestration, offline model conversion, operator comparison, log management, profiling tool, and operator customization. Therefore, the toolchain offers multi-layer and multi-function services for efficient development and execution of applications on this platform.

6.2.2 Neural Network Software Flow of Ascend AI Processors The neural network software flow of Ascend AI Processors is a bridge between the deep learning framework and Ascend AI Processors. It provides a shortcut for the neural network to quickly convert from the original model to the intermediate computing graph, and then to the offline model that is independently executed. The neural network software flow of Ascend AI Processors is used to generate, load, and execute an offline neural network application model. The neural network software flow of Ascend AI Processors integrates functional modules such as the process orchestrator (Matrix), DVPP, TBE, framework manager (Framework), Runtime, and Task Scheduler (TS) to form a complete functional cluster. Figure 6-7 shows the neural network software flow of Ascend AI Processors.

06 AI Computing Platform Atlas (Textbook)

211

Huawei Atlas Computing Platform

Page 10

Figure 6-7 Neural network software flow of Ascend AI Processors 1.

Process orchestrator: implements the neural network on Ascend AI Processors, coordinates the whole process of effecting the neural network, and controls the loading and execution of offline models.

2.

DVPP module: processes and modifies data before input to meet the format requirements of computing.

3.

TBE: functions as a neural network operator factory that provides powerful computing operators for neural network models.

4.

Framework manager: builds an original neural network model into a form supported by Ascend AI Processors, and integrates the new model into Ascend AI Processors to ensure efficient running of the neural network.

5.

Runtime: provides various resource management paths for task delivery and allocation of the neural network.

6.

Task scheduler: As a task driver for hardware execution, it provides specific target tasks for Ascend AI Processors. The Runtime and task scheduler work together to form a dam system for neural network task flow to hardware resources, and distribute different types of execution tasks in real time.

The neural network software provides an execution process that integrates software and hardware and has complete functions for Ascend AI Processors, facilitating the

06 AI Computing Platform Atlas (Textbook)

212

Huawei Atlas Computing Platform

Page 11

development of related AI applications. The following section describes several functional modules related to the neural network.

6.2.3 Functional Modules of the Ascend AI Processor Software Stack 6.2.3.1 TBE In the neural network structure, operators constitute the function networks for different applications. TBE, as a neural network operator factory, provides powerful computing operators for the neural network running based on Ascend AI Processors, and builds various neural network models using the TBE-compiled operators. TBE provides the operator encapsulation and calling capabilities. TBE offers a refined standard operator library for neural networks. Operators in the library can be directly employed to implement high-performance neural network computing. TBE also supports TBE operator fusion, which opens more possibilities for neural network optimization. TBE provides the capability of developing custom operators based on TVM. It can develop the corresponding neural network operators based on the TBE language on the custom operator programming development interface. TBE consists of the Domain-Specific Language (DSL) module, Schedule module, and Intermediate Representation (IR) module, Pass module, and CodeGen module. Figure 6-8 shows the structure of TBE. TBE operator development includes computation logic writing and scheduling development. The DSL module provides an interface for writing the operator computation logic and scheduling description. The operator computing process describes the operator computing operations and steps, while the scheduling process describes the data tiling and data flow planning. Operators are processed based on a fixed data shape each time. Therefore, data shape tiling needs to be performed in advance for operators executed on different computing units in Ascend AI Processors. For example, operators executed on the cube unit, the vector unit, and the AI CPU have different requirements for input data shapes.

06 AI Computing Platform Atlas (Textbook)

213

Huawei Atlas Computing Platform

Page 12

Figure 6-8 TBE structure After defining the basic implementation of an operator, you need to call the Tiling submodule to tile the operator data based on the scheduling description and specify the data transfer process to ensure optimal hardware execution. After data shape tiling, the Fusion submodule performs operator fusion and optimization. Once the operator is built, the IR module generates an IR of the operator in a TVM-like IR format. Then, the IR module is optimized in aspects including double buffering, pipeline synchronization, memory allocation management, instruction mapping, and tiling for adapting to the Cube Unit. After the operator traverses the Pass module, the CodeGen module generates a temporary C-style code file, which is used by the Compiler to generate the operator implementation file or directly loaded and executed by OME. In conclusion, a custom operator is developed by going through the internal modules of TBE. Specifically, the SDL module provides the operator computation logic and scheduling description as the operator prototype, the Schedule module performs data tiling and operator fusion, the IR module produces the IR of the generated operator, and then the Pass module performs compilation optimization in aspects such as memory allocation based on the IR. Finally, the CodeGen module generates C-style code for the Compiler for direct compilation. During operator definition, TBE defines the operator and performs optimization in many aspects, thereby boosting the operator execution performance. Figure 6-9 shows the three application scenarios of TBE.

06 AI Computing Platform Atlas (Textbook)

214

Huawei Atlas Computing Platform

Page 13

Figure 6-9 Three application scenarios of TBE 1.

Generally, a neural network model implemented by using standard operators under a deep learning framework have been trained by using the GPU or a neural network chip. If the neural network model continues to run on the Ascend AI Processor, it is expected that the performance of the Ascend AI Processor can be maximized without changing the original code. Therefore, TBE provides a complete set of TBE operator acceleration libraries. Operators in the libraries are in a one-to-one mapping with common standard operators in the neural network in terms of functions. In addition, the software stack provides a programming interface for calling operators. This boosts various frameworks or applications in the upper-layer deep learning and avoids developing adaptation code at the bottom layer of the Ascend AI chip.

2.

If a new operator is introduced to build the neural network model, custom operator development needs to be performed in the TBE language. This development approach is similar to CUDA C++ used on the GPU. Multifunctional operators can be implemented, and various network models can be flexibly written. The compiled operators are submitted to the compiler for compilation. The compiler executes the operators on the AI Core or AI CPU to boost the chip.

3.

In a proper scenario, the operator convergence capability provided by TBE promotes operator performance improvement. Consequently, neural network operators can implement multi-level cache convergence based on buffers of different levels, and the on-chip resource utilization rate can be significantly improved when the Ascend AI chip executes converged operators.

In conclusion, in addition to the operator development capability, TBE provides the standard operator calling and operator convergence and optimization capabilities so that the Ascend AI Processor can meet the requirements of diversified functions in actual neural network applications. Therefore, the Ascend AI Processor makes the neural network construction more convenient and flexible, improves the convergence and optimization capabilities, and enhances the running performance of the neural network.

06 AI Computing Platform Atlas (Textbook)

215

Huawei Atlas Computing Platform

Page 14

6.2.3.2 Matrix 

Overview The Ascend AI Processor divides the network execution layers and regards the execution operations of a specific function as a basic execution unit, that is, the computing engine. Each computing engine performs basic operations on data, for example, classifying images, preprocessing input images, or identifying output image data. An engine can be customized to implement a specific function. With Matrix, a neural network application generally includes four engines: data engine, preprocessing engine, model inference engine, and postprocessing engine, as shown in Figure 6-10.

Figure 6-10 Workflow of the computing engines of a deep neural network application 1)

The data engine prepares the datasets (for example, MNIST dataset) required by neural networks and processes the data (for example, image filtering) as the data source of the downstream engine.

2)

Generally, the input media data needs to be preprocessed to meet the computing requirements of the Ascend AI Processor. The preprocessing engine pre-processes the media data, encodes and decodes images and videos, and converts their format. In addition, all functional modules of digital vision preprocessing need to be invoked by the process orchestrator.

3)

A model inference engine is required when neural network inference is performed on a data flow. This engine implements forward computation of a neural network by using the loaded model and the input data flow.

4)

After the model inference engine outputs the result, the postprocessing engine performs postprocessing on the data output by the model inference engine, for example, adding a box or label for image recognition.

Figure 6-10 shows a typical computing engine flowchart. In the engine flowchart, each data processing node is an engine. A data flow is processed and computed after passing through each engine according to an orchestrated path. Then, the required result is finally output. The final output result of the entire flowchart is the result output by corresponding neural network computing. Two adjacent engine nodes are connected according to the configuration file in the engine flowchart. The data of a specific network model flows by each node according to the node connections. After configuring node attributes, you can feed data to the start node of the engine flow to start the engine running process.

06 AI Computing Platform Atlas (Textbook)

216

Huawei Atlas Computing Platform

Page 15

Matrix runs above the chip enabling layer (L1) and below the application enabling layer (L3). It provides unified and standard intermediate APIs across operating systems (such as Linux and Android). Matrix is responsible for establishing and destroying the entire engine and reclaiming computing resources. Matrix creates an engine according to the engine configuration file, and provides input data before execution. If the input data does not meet the processing requirements (for example, video data that is unsupported), the DVPP module can be called through the corresponding API to perform data preprocessing. If the input data meets the processing requirements, inference and computation are performed by directly calling the offline model executor (OME) through an API. During the execution, Matrix enables multi-node scheduling and multi-process management. It is responsible for running the computing process on the device side, guarding the computing process, and collecting statistics on execution information. After the model execution is complete, Matrix can obtain application output results to the host. 

Application scenarios The Ascend AI Processor can be used to build hardware platforms with different dedicated features for different services. Based on the collaboration between hardware and hosts, the common application scenarios are accelerator cards (Accelerator) and developer boards (Atlas 200 DK). The application of the process orchestrator in these two typical scenarios is different.

1.

Application scenario of the accelerator card The PCIe accelerator card based on the Ascend AI Processor is used for the data center and the edge server, as shown in Figure 6-11.

Figure 6-11 PCIe accelerator card The PCIe accelerator card supports multiple data precision formats and provides higher performance than other similar accelerator cards, providing more powerful

06 AI Computing Platform Atlas (Textbook)

217

Huawei Atlas Computing Platform

Page 16

computing capability for neural networks. In this scenario, the accelerator card needs to be connected to the host, which can be a server or personal computer (PC) supporting the PCIe card. The host calls the neural network computing capability of the accelerator card to perform related computations. In the accelerator card scenario, the process orchestrator implements its functions by using its three subprocesses: process orchestration agent subprocess (Matrix Agent), process orchestration daemon subprocess (Matrix Daemon), and process orchestration service subprocess (Matrix Service). Matrix Agent usually runs on the host side. It controls and manages the data engine and postprocessing engine, performs data interaction with the host-side application, controls the application, and communicates with the handling process of the device side. Matrix Daemon runs on the device side. It creates processes based on the configuration file, starts and manages the engine orchestration on the device side, and releases the computing process and reclaims resources after the computing is complete. Matrix Service runs on the device side. It starts and controls the preprocessing engine and model inference engine on the device side. By controlling the preprocessing engine, Matrix calls the DVPP APIs for preprocessing video and image data. Matrix Service can also call the model manager APIs of the OME to load and infer offline models. Figure 6-12 shows the inference process of the offline neural network model by using the process orchestrator.

Figure 6-12 Inference process of the offline neural network model by using the process orchestrator

06 AI Computing Platform Atlas (Textbook)

218

Huawei Atlas Computing Platform

Page 17

The offline model of the neural network performs inference calculation through the process orchestrator in the following three steps: 1)

Create an engine: Matrix uses engines with different functions to orchestrate the execution process of a neural network. First, the application calls Matrix Agent on the host side, orchestrates the engine flow of the neural network according to the pre-compiled configuration file, creates an execution process of the neural network, and defines a task of each engine. Then, the engine orchestration unit uploads the offline model file and the configuration file of the neural network to Matrix Daemon on the device side, and Matrix Service on the device side initializes the engine. Matrix Service controls the model inference engine to call the initialization API of the model manager to load the offline model of the neural network. In this way, an engine is created.

2)

Execute an engine: The neural network functions are computed and implemented after an engine is created. After the offline model is loaded, Matrix Agent on the host side is notified to input application data. The application directly sends the data to the data engine for processing. If the input data is media data and does not meet the calculation requirements of the Ascend AI Processor, the pre-processing engine starts immediately and calls the APIs of the digital vision pre-processing module to pre-process the media data, such as encoding, decoding, and zooming. After the preprocessing is complete, the data is returned to the preprocessing engine, which then sends the data to the model inference engine. In addition, the model inference engine calls the processing APIs of the model manager to combine the data with the loaded offline model to perform inference and computation. After obtaining the output result, the model inference engine calls the data sending API of the engine orchestration unit to return the inference result to the postprocessing engine. After the postprocessing engine completes a postprocessing operation on the data, it finally returns the postprocessed data to the application by using the engine orchestration unit. In this way, an engine is executed.

3)

Destroy an engine: After all computing tasks are completed, the system releases system resources occupied by the engine. After all engine data is processed and returned, the application notifies Matrix Agent to release computing hardware resources of the data engine and postprocessing engine. Accordingly, Matrix Agent instructs Matrix Service to release resources of the preprocessing engine and model inference engine. After all resources are released, the engine is destroyed, and Matrix Agent notifies the application that the next neural network execution can be performed.

2.

Application scenario of the developer board The Atlas 200 DK application scenario refers to the application of the Atlas 200 developer kit (Atlas 200 Developer Kit, Atlas 200 DK) based on the Ascend AI Processor, as shown in Figure 6-13.

06 AI Computing Platform Atlas (Textbook)

219

Huawei Atlas Computing Platform

Page 18

Figure 6-13 Atlas 200 DK developer kit The developer kit opens the core functions of the Ascend AI Processor through the peripheral interfaces on the board, facilitating the control and development of the Ascend AI Processor for external devices and making full use of the neural network processing capability of the chip. Therefore, the developer suite built based on the Ascend AI Processor can be widely used in different AI fields and will serve as the key hardware on the mobile device side in the future. In the developer board scenario, the control function of the host is also implemented on the developer board. Figure 6-14 shows the logical architecture of the developer board.

Figure 6-14 Logical architecture of the developer board As the functional interface of the Ascend AI Processor, Matrix implements data interaction between the computing engine flowchart and applications. It creates a

06 AI Computing Platform Atlas (Textbook)

220

Huawei Atlas Computing Platform

Page 19

computing engine flowchart based on the configuration file, orchestrates the process, and performs process control and management. After the computing is complete, Matrix destroys the computing engine flowchart and reclaims resources. During the preprocessing, Matrix calls the APIs of the preprocessing engine to implement media preprocessing. During the inference, Matrix can also call the APIs of the model manager to implement the loading and inference of the offline model. In the developer board scenario, Matrix coordinates the implementation process of the entire engine flow, with no need to interact with other devices.

6.2.3.3 TS TS and Runtime form a dam system between software and hardware. During execution, TS drives hardware tasks, provides specific target tasks to the Ascend AI Processor, completes the task scheduling process with Runtime, and sends the output data back to Runtime. TS functions as a channel for task transmission, distribution, and data backhaul. 

Overview TS runs on the task scheduling CPU on the device side, and is responsible for assigning specific tasks distributed by Runtime to the AI CPU. It can also assign tasks to the AI Core through the hardware-based block scheduler (BS), and return the task execution results to Runtime. Generally, TS manages the following tasks: AI Core tasks, AI CPU tasks, memory copy tasks, event recording tasks, event waiting tasks, maintenance tasks, and performance profiling tasks. Memory copy is performed mainly in asynchronous mode. An event recording task records the event information. If there are tasks waiting for the event, these tasks can continue to be executed after event recording is complete, unblocking the stream. For an event waiting task, if the expected event is complete, the waiting task is completed; if the expected event has not happened, the waiting task is added to the "to-do list", the processing of all subsequent tasks in the stream where the waiting task is located is suspended until the expected event occurs. After a task is executed, a maintenance task clears data based on task parameters and reclaims computing resources. During the execution, a profiling task collects and analyzes the computing performance. The start and pause of the performance profiling are configurable. Figure 6-15 shows the functional framework of TS. TS is usually located at the device end and its functions are implemented by the task scheduling CPU. The task scheduling CPU consists of the scheduling interface, scheduling engine, scheduling logic processing module, AI CPU scheduler, block scheduler (BS), system control (SysCtrl) module, Profiling tool, and Log tool.

06 AI Computing Platform Atlas (Textbook)

221

Huawei Atlas Computing Platform

Page 20

Figure 6-15 Functional framework of TS The task scheduling CPU communicates and interacts with Runtime and the driver through the scheduling interface. The scheduling engine controls task organization, task dependency, and task scheduling, and manages the execution of the task scheduling CPU. The scheduling engine classifies tasks into computing, memory, and control tasks by type, assigns the tasks to different scheduling logic processing modules, and manages and schedules the logic of kernel tasks, memory tasks, and inter-stream event dependency. The logic processing module consists of three submodules: Kernel Execute, DMA Execute, and Event Execute. Kernel Execute schedules computing tasks, implements task scheduling logic on the AI CPU and AI Core, and schedules specific kernel functions. DMA Execute implements the scheduling logic of storage tasks, and performs scheduling such as memory copy. Event execute implements the scheduling logic of synchronization control tasks and implements the logic processing of interstream event dependency. After the scheduling logic of different types of tasks is processed, the tasks are directly sent to required control units for hardware execution. The AI CPU scheduler in the task scheduling CPU manages the AI CPU status and schedule tasks in a software-based approach. For task execution of the AI Core, the task scheduling CPU assigns a processed task to the AI Core by using independent block scheduler hardware. The AI Core performs specific computation. Then, the computation result is returned by the BS to the task scheduling CPU. When the task scheduling CPU completes task scheduling, the system control module initializes the system configurations and chip functions. In addition, the Profiling and Log tools the execution process and keeps of key execution parameters and details. When the execution is complete or an error is reported, you can perform performance profiling or error location to evaluate the execution result and efficiency. 

Schedule processes

06 AI Computing Platform Atlas (Textbook)

222

Huawei Atlas Computing Platform

Page 21

In the execution of an offline neural network model, TS receives specific execution tasks from OME. The dependency relationship between the tasks is removed before task scheduling. Then, the tasks are distributed to the AI Core and AI CPU according to task types for hardware-based computation and execution. A task is formed by multiple execution commands (CMDs). In task scheduling, TS and Runtime interact with each other for orderly CMD scheduling. Runtime is executed on the host CPU, the CMD queue is located in the memory of the device, and TS delivers specific task CMDs. Figure 6-16 shows the detailed scheduling process.

Figure 6-16 Runtime and TS workflow Runtime calls the dvCommandOcuppy interface of the driver to access the CMD queue, queries the available memory space in the CMD queue according to the CMD tail, and returns the address of the available memory space to Runtime. Runtime adds prepared task CMDs into the CMD queue memory space, and calls the dvCommandSend interface of the driver to update the tail position and credit information of the CMD queue. After receiving new task CMDs, the queue generates a doorbell interrupt and notifies TS that new task CMDs have been added to the CMD queue in the device DDR. TS accesses the device memory, transfers the task CMDs to the TS buffer for storage, and updates the header information of the CMD queue in the device DDR. Finally, TS schedules the cached CMDs to the specified AI CPU and AI Core for execution. The software stack structure is basically the same as that of most accelerators. Runtime, driver, and TS in the Ascend AI Processor closely cooperate with each other to sequentially distribute tasks to the corresponding hardware resources for execution. This scheduling process delivers tasks in an intensive and orderly manner for the computation of a deep neural network, ensuring continuity and efficiency of task execution.

06 AI Computing Platform Atlas (Textbook)

223

Huawei Atlas Computing Platform

Page 22

6.2.3.4 Runtime Figure 6-17 shows the position of Runtime in the software stack. The TBE standard operator library and offline model executor are located at the upper layer of Runtime. The TBE standard operator library provides operators required by the neural network for the Ascend AI Processor. The offline model executor is used to load and execute offline models. The driver is located at the lower layer of Runtime, which interacts with the Ascend AI Processor at the bottom layer.

Figure 6-17 Position of Runtime Runtime provides various interfaces for external devices to call, such as storage interface, device interface, execution stream interface, event interface, and execution control interface. Different interfaces are controlled by the Runtime engine to implement different functions, as shown in Figure 6-18.

06 AI Computing Platform Atlas (Textbook)

224

Huawei Atlas Computing Platform

Page 23

Figure 6-18 Various interfaces provided by Runtime The storage interface allows you to allocate, free, and copy a High Bandwidth Memory (HBM) or double data rate (DDR) memory on the device, including device-host, hostdevice, and device-device data copying. Memory can be copied in synchronous or asynchronous mode. Synchronous copying indicates that other operations can be performed only after memory copying is complete. Asynchronous copying indicates that other operations can be performed at the same time when memory copying is ongoing. The device interface allows you to query the number and attributes of lower-layer devices, select devices, and reset devices. After the offline model calls the device interface and a featured device is selected, all tasks in the model will be executed on the selected device. If a task needs to be distributed to another device during the execution, the device interface needs to be called again to select a device. The stream interface allows you to create and release streams, define priorities, set callback functions, define event dependencies, and synchronize events. These functions are related to the execution of tasks in the streams. In addition, the tasks in a single stream must be executed in sequence. If multiple streams need to be synchronized, the event interface needs to be called to create, release, record, and define the synchronization event. This ensures that multiple streams can be synchronously executed and the final model result is output. In addition to dealing with distribution dependencies between tasks or streams, the event interface can also be called for labeling time and record execution timing during application running. During execution, the execution control interface is also used. The Runtime engine finishes the tasks such as kernel loading and asynchronous memory copying by using the execution control interface and Mailbox.

6.2.3.5 Framework 

Functional structure Framework collaborates with the TBE to generate an executable offline model for the neural network. Before the neural network executes offline models, Framework and the Ascend AI Processor cooperate to generate a high-performance offline model that matches the hardware, and invokes Matrix and Runtime to deeply integrate the offline model with the Ascend AI Processor. During the neural network execution, Framework works with Matrix, Runtime, TS, and bottom-layer hardware to integrate the offline model, data, and Da Vinci architecture, optimizing the execution process to obtain outputs of the neural network applications. Framework consists of three parts: offline model generator (OMG), offline model executor (OME), and model manager (AI Model Manager), as shown in Figure 6-19. Developers use the OMG to generate offline models and save the models as .om files. Then, Matrix in the software stack calls the AI model manager in Framework to start the OME and load the offline model onto the Ascend AI Processor. Finally, the offline model is executed through the entire software stack. The offline Framework manages the entire process of generating an offline model, loading the model onto the Ascend AI Processor, and executing the model.

06 AI Computing Platform Atlas (Textbook)

225

Huawei Atlas Computing Platform

Page 24

Figure 6-19 Offline model function framework 

Generation of an offline model The convolutional neural network (CNN) is used as an example. When a corresponding network model is built in a deep learning framework, original data is trained, operator scheduling optimization, weight data rearrangement and compression, and memory optimization are performed by using OMG, then an optimized offline model is generated. OMG is used to generate offline models that can be efficiently executed on the Ascend AI Processor. Figure 6-20 shows the working principle of OMG. After receiving the original model, OMG performs model parsing, quantization, compilation, and serialization on the convolutional neural network model.

Figure 6-20 Working principle of OMG 1.

Model parsing During the parsing process, OMG can parse the original network models in different frameworks, extract the network structure and weight parameters of the original models, and redefine the network structure by using the unified intermediate IR graph. The IR graph consists of compute nodes and data nodes. The compute nodes consist of TBE operators with different functions, while the data nodes are used to receive different tensor data and provide various input data required for computation on the entire network. This IR graph is composed of a graph and weights, covering the information of all original models. The IR graph creates a bridge between different deep learning frameworks and the Ascend AI software stack, enabling

06 AI Computing Platform Atlas (Textbook)

226

Huawei Atlas Computing Platform

Page 25

neural network models constructed by external frameworks to be easily converted into offline models that can be executed by the Ascend AI Processor. 2.

Quantification Quantization is a process of performing low-bit quantization on high-precision data to save network storage space, reduce a transmission delay, and improve operation execution efficiency. A quantization process is shown in Figure 6-21.

Figure 6-21 Quantization process After the parsing is complete, an intermediate graph is generated. If needed, the model can be quantized by using an automatic quantization tool based on the structure and weight of the intermediate graph. In an operator, weights and offsets can be quantized. During offline model generation, the quantized weights and offsets are stored in the offline model, which are used to compute input data during inference and computation. The calibration set is used to train quantization parameters during quantization, ensuring the quantization precision. If quantification is not required, directly compile the offline model. Quantization modes include data offset quantization and non-offset quantization. The quantization scale and offset need to be output. If the non-offset quantization mode is used, all the data is quantized in non-offset mode, and only the scale is computed for output. If the offset quantization is used, all the data is quantized in offset mode, and both the scale and offsets are computed for output. Weights are always quantized in non-offset mode because they have a high requirement for quantization precision. For example, if the INT8 type quantization is performed on a weight file according to a quantization algorithm, the INT8 weight and the quantization scale are output. During offset quantization, FP32-type offset data may

06 AI Computing Platform Atlas (Textbook)

227

Huawei Atlas Computing Platform

Page 26

be quantized into INT32-type data for output based on the quantization scales of the weight and data. You can perform quantization if you have stricter requirements on the model size and performance. Low-bit quantization for high-precision data during model generation helps generate a more lightweight offline model, saving network storage space, reducing transfer latency, and improving computation efficiency. Because the model size is greatly affected by parameters, OMG focuses on the quantization of operators with parameters, such as the Convolution, FullConnection, and ConvolutionDepthwise operators. 3.

Compilation After model quantization is complete, the model needs to be built. The building includes operator and model building. Operator building provides specific operator implementation, and model building aggregates and connects operator models to generate an offline model structure. Operator building Operator building is used to generate operators, mainly offline structures specific to operators. Operator generation includes three stages, namely, input tensor description, weight data conversion, and output tensor description. In the input tensor description, information such as the input dimensions and memory size of each operator is computed, and the form of operator input data is defined in OMG. In weight data conversion, the weight parameters used by operators are processed, including data format conversion (for example, FP32 to FP16), shape conversion (for example, fractal rearrangement), and data compression. In the output tensor description, information such as the output dimensions and memory size of an operator is computed. Figure 6-22 shows the operator generation process. In this process, the shape of the output data needs to be analyzed and described by using the APIs of the TBE operator acceleration library. Data format conversion can also be implemented by using the APIs of the TBE operator acceleration library.

06 AI Computing Platform Atlas (Textbook)

228

Huawei Atlas Computing Platform

Page 27

Figure 6-22 Operator generation workflow OMG receives the IR graph generated by the neural network, describes each node in the IR graph, and parses the inputs and outputs of each operator one by one. OMG analyzes the input source of the current operator, obtains the type of the directly connected upper-layer operator, and searches the operator library for the output data description of the source operator using the API of the TBE operator acceleration library. Then, the output data information of the source operator is returned to OMG, as the input tensor description of the current operator. Therefore, the description of the input data of the current operator can be obtained by analysing the output information of the source operator. If the node in the IR graph is not an operator but a data node, the input tensor description is not required. If an operator, such as a Convolution or FullConnection operator, has weight data, the weight data must be described and processed. If the type of the input weight data is FP32, OMG needs to call the ccTransTensor API to convert the weight to the FP16 type to meet format requirements of the AI Core. After the type conversion, OMG calls the ccTransFilter API to perform fractal rearrangement on the weight data so that the weight input shape can meet the format requirements of the AI Core. After obtaining the weight in a fixed format, OMG calls the ccCompressWeight API provided by TBE to compress and optimize the weight, thereby reducing the weight size and making the model more lightweight. The converted weight data that meets the computation requirements is returned to OMG. After the weight data is converted, OMG needs to describe the output data of the operator to determine the output tensor form. For a high-level complex operator, such as a Convolution or Pooling operator, OMG directly obtains the output tensor information of the operator by using the computing API provided by the TBE operator acceleration library and input tensor information and weight of the operator. For a low-level simple operator, such as an addition operator, the output tensor information can be determined according to input tensor information and stored in OMG. According to the foregoing running process, OMG traverses all operators in a network IR graph, cyclically performs the operator generation, describes the input and output tensors and weight data of all operators, completes the representation of all operator offline structures, and provides operator models for model generation. Model Build After an operator is generated during building, OMG needs to generate models to obtain their corresponding offline structures. OMG obtains an IR graph, performs concurrent scheduling analysis on the operator, and splits streams of multiple nodes of the IR graph to obtain streams formed by the operator and data inputs. The streams may be considered as execution sequences of the operator. Nodes that do not depend on each other are directly allocated to different streams. If nodes in different streams depend on each other, the rtEvent interface is called to synchronize multiple streams. If the AI Core has sufficient computing resources, multi-stream scheduling can be provided for the AI Core by splitting streams, thereby improving computing performance of a network model. However, if the AI Core processes lost

06 AI Computing Platform Atlas (Textbook)

229

Huawei Atlas Computing Platform

Page 28

of tasks concurrently, resource preemption will be intensified and the execution performance will deteriorate. Generally, a single stream is used to process the network by default, to avoid congestion caused by concurrent execution of multiple tasks. In addition, based on the execution relationship of execution sequences of multiple operators, OMG may perform optimization for operator fusion and memory reuse, which is independent of hardware. Based on the input and output memory information of operators, OMG can perform computing memory reuse and write the reuse information into the model and operator description to generate an efficient offline model. These optimization operations may reallocate computing resources when multiple operators are executed. In this way, memory usage during running can be minimized, and frequent memory allocation and release during running can be avoided, so that multiple operators can be executed by using minimum memory usage and a minimum data migration frequency, improving performance and reducing requirements for hardware resources. 4.

Serialization The offline model generated after compilation is stored in the memory and needs to be serialized. During serialization, the signature and encryption functions are provided for model files to further encapsulate and protect the integrity of the offline model. After the serialization is complete, the offline model can be output from the memory to an external file for the remote Ascend AI chip to call and execute.

6.2.3.6 DVPP As the encoding/decoding and image conversion module in the Ascend AI software stack, the digital vision pre-processing (DVPP) module provides the pre-processing auxiliary function for the neural network. DVPP converts the video or image data input from the system memory and network into a format supported by the the Da Vinci architecture of the Ascend processors before neural network computing. 

Functional architecture DVPP contains six submodules: video decoding (VDEC), video encoding (VENC), JPEG decoding (JPEGD), JPEG encoding (JPEGE), PNG decoding (PNGD), and vision preprocessing (VPC). VDEC decodes H.264/H.265 videos and outputs images for video preprocessing. 1)

VENC encodes output videos. For the output data of DVPP or the original input YUV data, VENC encodes the data and outputs H.264/H.265 videos to facilitate video playback and display.

2)

JPEGD decodes JPEG images, converts their format into YUV, and preprocesses the inference input data for the neural network.

3)

After JPEG images are processed, JPEGE is used to restore the format of processed data to JPEG for the post-processing of the inference output data of the neural network.

4)

When input images are in PNG format, PNGD needs to be called to decode the images and output the data in RGB format to the Ascend AI Processor for inference and calculation.

06 AI Computing Platform Atlas (Textbook)

230

Huawei Atlas Computing Platform

5)

Page 29

VPC provides other processing functions for images and videos, such as format conversion (for example, conversion from YUV/RGB format to YUV420 format), size scaling, and cropping.

Figure 6-23 shows the execution process of DVPP, which is implemented together by Matrix, DVPP, DVPP driver, and DVPP dedicated hardware. 1)

Matrix is located at the top layer of the framework. It schedules functional modules in DVPP to process and manage data flows.

2)

DVPP is located at a layer below Matrix. It provides Matrix with APIs for calling video and image processing modules and configuring parameters of the encoding and decoding modules and the VPC module.

3)

The DVPP driver is located at the layer between DVPP and the DVPP dedicated hardware. It manages devices and engines, and provides the drive capability for engine modules. The driver allocates the corresponding DVPP hardware engine based on the tasks assigned by DVPP, and reads and writes into registers in the hardware module to complete hardware initialization tasks.

Figure 6-23 Execution process of DVPP 4)

The tangible hardware computing resource for the DVPP module group is located at the bottom layer. It is a dedicated accelerator independent of other modules in the Ascend AI Processor and is responsible for encoding, decoding, and preprocessing tasks corresponding to images and videos.

06 AI Computing Platform Atlas (Textbook)

231

Huawei Atlas Computing Platform



Page 30

Pre-processing mechanism If the data engine detects that the format of input data does not meet processing requirements of AI Core, DVPP is enabled to perform data preprocessing. This section uses image preprocessing as an example: 1)

Matrix transfers data from the memory to the DVPP buffer for buffering.

2)

Based on the specific data format, the pre-processing engine configures parameters and transmits data through the programming APIs provided by DVPP.

3)

After the APIs are invoked, DVPP sends configuration parameters and raw data to the driver, which calls PNGD or JPEGD to initialize and deliver tasks.

4)

The PNGD or JPEGD module in the DVPP dedicated hardware decodes images into YUV or RGB data for subsequent processing.

5)

After the decoding is complete, Matrix calls VPC using the same mechanism to further convert the images into the YUV420SP format, because the YUV420SP format features high storage efficiency and low bandwidth usage. As a result, more data can be transmitted at the same bandwidth, meeting high throughput requirements of AI Core for robust computing. In addition, DVPP performs image cropping and resizing. Figure 6-24 shows typical cropping and zero padding operations that change an image size. VPC extracts a required part from an original image, and then performs a zero padding operation on the part to reserve edge feature information in a convolutional neural network calculation process. Zero padding is required for the top, bottom, left, and right regions. Image edges are extended in zero padding regions to generate an image that can be directly used for computation.

Figure 6-24 Image preprocessing data flow 6)

After a series of image preprocessing, the image data is processed in either of the following methods:

The image data is further preprocessed by AIPP based on model requirements, which can be skipped if DVPP output data meets model requirements. Scheduled by AI CPU, the processed data is sent to AI Core for neural network computing. The JPEGD module encodes all output image data and saves the encoded data to the buffer of DVPP. Matrix reads the data out for subsequent operations and frees DVPP computing resources to reclaim the buffer. During the entire preprocessing, Matrix calls functions of different modules. As a custom module for data supply, DVPP provides sufficient data sources for AI Core by quickly converting image data in a heterogeneous or dedicated processing manner,

06 AI Computing Platform Atlas (Textbook)

232

Huawei Atlas Computing Platform

Page 31

meeting large throughput and high bandwidth requirements of neural network computing.

6.2.4 Data Flowchart of the Ascend AI Processor This section uses the facial recognition inference application as an example to describe the data flowchart of the Ascend AI Processor (Ascend 310). The camera collects and processes data, performs inference on the data, and outputs the facial recognition result, as shown in Figure 6-25. 

The camera collects and processes data: 1)

Compressed video streams are transmitted from the camera to the DDR memory through PCIe.

2)

DVPP reads the compressed video streams into the cache.

3)

After preprocessing, DVPP writes decompressed frames into the DDR memory.

Figure 6-25 Data flowchart of Ascend 310 

Data inference 1)

TS sends an instruction to the DMA engine to pre-load AI resources from the DDR to the on-chip buffer.

2)

TS configures the AI Core to execute tasks.

3)

The AI Core reads the feature map and weight, and writes the result to the DDR or on-chip buffer.

Facial recognition result output 1)

After processing, the AI Core sends the signals to TS, which checks the result. If another task needs to be allocated, the operation in step ④ is performed, as shown in Figure 6-25.

2)

When the last AI task is completed, TS reports the result to the host.

06 AI Computing Platform Atlas (Textbook)

233

Huawei Atlas Computing Platform

Page 32

6.3 Atlas AI Computing Platform 6.3.1 Overview of the Atlas AI Computing Platform Powered by Ascend series AI processors, Huawei's Atlas AI computing platform offers AI solutions for all scenarios across devices, edge, and cloud, covering modules, boards, edge stations, servers, and clusters. This section describes the main products of Huawei's Atlas AI computing platform in the categories of inference and training. Inference products include the Atlas 200 AI accelerator module, Atlas 200 DK, Atlas 300 inference card, Atlas 500 AI edge station, and Atlas 800 inference server, which all integrate the Ascend 310 processor. Training products include the Atlas 300 AI training card, Atlas 800 training server, and Atlas 900 AI cluster, which all use the Ascend 910 processor. Figure 6-26 shows the Atlas AI computing platform portfolio.

Figure 6-26 Atlas AI computing platform portfolio

6.3.2 Atlas Accelerates AI Inference 6.3.2.1 Atlas 200 AI Accelerator Module: High Performance and Low Power Consumption Packaged in a form factor half the size of a credit card, the Atlas 200 AI accelerator module consumes as low as 9.5 W of power while supporting 16-channel real-time HD video analytics. This high-performance, low-power product can be deployed on devices such as cameras, drones, and robots. By integrating the HiSilicon Ascend 310 AI processor, Atlas 200 is ideal for analysis and inferential computing of data such as images and videos. It can be widely used in intelligent surveillance, robots, drones, and video servers. Figure 6-27 shows the system architecture of Atlas 200.

06 AI Computing Platform Atlas (Textbook)

234

Huawei Atlas Computing Platform

Page 33

Figure 6-27 Atlas 200 system architecture Atlas 200 has the following features: 1.

Powered by high-performance Huawei Ascend 310 AI processor, Atlas 200 provides the 16 TOPS INT8 or 8 TOPS FP16 multiply/add computing capability.

2.

Atlas 200 supports various interfaces, such as the PCIe 3.0 x4, RGMII, USB 2.0/USB 3.0, I2C, SPI, and UART.

3.

Atlas 200 supports up to 16-channel 1080p 30 FPS video access.

4.

Atlas 200 supports multiple specifications of H.264 and H.265 video encoding and decoding, meeting various video processing requirements.

6.3.2.2 Atlas 200 DK: Strong Computing Power and Ease-of-Use The Atlas 200 Developer Kit (Atlas 200 DK) is a developer board that integrates the Atlas 200 AI accelerator module. Atlas 200 DK helps AI application developers quickly get familiar with the development environment. It provides external ports for developers to quickly and easily access and use the powerful processing capability of the Ascend 310 processor. Atlas 200 DK consists of the Atlas 200 AI accelerator module, image/audio interface chip (Hi3559C), and LAN switch. Figure 6-28 shows the system architecture of Atlas 200 DK. Atlas 200 DK has the following performance features: 1.

Provides up to16 TOPS computing power on INT8 data.

2.

Supports 2-channel camera inputs, 2-channel ISP, and HDR10.

3.

Supports 1000 Mbit/s Ethernet to provide high-speed network connections, delivering strong computing capabilities.

4.

Provides a universal 40-pin expansion connector (reserved), facilitating product prototype design.

5.

Supports 5 V to 28 V DC power inputs.

06 AI Computing Platform Atlas (Textbook)

235

Huawei Atlas Computing Platform

Page 34

Figure 6-28 Atlas 200 DK System Architecture Table 6-1 lists the product specifications of Atlas 200 DK.

Table 6-1 Product specifications of Atlas 200 DK Item

Specifications 2 x Da Vinci AI Cores

AI processor

Computing power

Processor: 8-core ARM Cortex-A55, max. 1.6 GHz Multiplication and addition computing performance: 8 TFLOPS FP16, 16 TOPS INT8 LPDDR4X, 128-bit

Memory

Capacity: 4/8 GB Interface rate: 3200 Mbit/s

Storage Network port USB port

06 AI Computing Platform Atlas (Textbook)

1 x micro SD card, which supports SD 3.0 and provides a maximum rate of SDR50 and a maximum capacity of 2 TB One GE RJ-45 port 1 x USB 3.0 Type-C port, which can be used only to connect a slave device and

236

Huawei Atlas Computing Platform

Page 35

compatible with USB 2.0 1 x 40-pin I/O connector Other interfaces

2 x 22-pin MIPI connectors 2 x onboard microphones

Power supply Dimensions (H x W x D)

5 V to 28 V DC. 12 V 3 A adapter is configured by default. 137.8 mm x 93.0 mm x 32.9 mm

Power consumption

20 W

Weight

234 g

Operating temperature

0ºC to 35ºC (32ºF to 95ºF)

Storage temperature

0ºC to 85ºC (32ºF to 185ºF)

Advantages of Atlas 200 DK: For developers, a laptop can be used to set up a development environment. The local independent environment is cost-effective, and can provide multiple functions and interfaces to meet basic requirements. For researchers, the collaboration mode of local development and cloud training can be adopted. HUAWEI CLOUD and Atlas 200 DK use the same set of protocol stacks for cloud training and local deployment. Therefore, no modification is required. For entrepreneurs, code-level demos are provided, and 10% of the code is modified to complete the algorithm function according to the reference architecture. They can interact with the developer community and migrate their commercial products in a seamless manner.

6.3.2.3 Atlas 300: Industry's Highest-Density, 64-Channel Video Inference Accelerator Card Huawei Atlas 300 accelerator cards can be categorized into two models: 3000 and 3010. The two models differ in the architecture (such as x86 and ARM). This section describes only the Huawei Atlas 300 AI accelerator card (model 3000). Atlas 300 AI accelerator card (model 3000) is developed based on the HiSilicon Ascend 310 AI processor. It uses four PCIe HHHL cards of the HiSilicon Ascend 310 AI processor and works with main devices (such as Huawei TaiShan servers) to implement fast and efficient inference, such as image classification and object detection. Figure 6-29 shows the system architecture of the Huawei Atlas 300 AI accelerator card (model 3000).

06 AI Computing Platform Atlas (Textbook)

237

Huawei Atlas Computing Platform

Page 36

Figure 6-29 System architecture of the Atlas 300 AI accelerator card (model 3000) The Atlas 300 AI accelerator card (model 3000) can be used in scenarios such as video analysis, OCR, voice recognition, precision marketing, and medical image analysis. Its typical application scenario is the facial recognition system. It uses the algorithms of face detection, , face-based quality evaluation, and high-speed face comparison to implement functions such as real-time face capture and modeling, real-time alarm based on blacklist comparison, and facial image retrieval. Figure 6-30 shows the facial recognition system architecture. The main devices include the HD webcam or face capture webcam at the device side, media stream storage server (optional), intelligent facial analysis server, facial comparison search server, central management server, and client management software. The Atlas 300 AI accelerator card (model 3000) is deployed in the intelligent facial analysis server to implement functions such as video decoding and pre-processing, face detection, face alignment (correction), and facial feature extraction for inference.

Figure 6-30 Facial recognition system architecture Table 6-2 lists the product specifications of the Atlas 300 AI accelerator card (model 3000).

06 AI Computing Platform Atlas (Textbook)

238

Huawei Atlas Computing Platform

Page 37

Table 6-2 Product specifications of the Atlas 300 AI accelerator card (model 3000) Model Form factor Memory Computing power

Encoding/Decoding capability

PCIe port Power consumption Dimensions Weight Operating temperature

Atlas 300 AI Accelerator Card (Model 3000) Half-height half-length PCIe standard card LPDDR4 x 32 GB, 3200 Mbit/s 64 TOPS INT8 H.264 hardware decoding, 64-channel 1080p 30 FPS (2-channel 3840 x 2160 60 FPS) H.265 hardware decoding, 64-channel 1080p 30 FPS (2-channel 3840 x 2160 60 FPS) H.264 hardware encoding, 4-channel 1080p 30 FPS H.265 hardware encoding, 4-channel 1080p 30 FPS JPEG decoding capability of 4 x 1080p 256 FPS and encoding capability of 4 x 1080p 64 FPS PNG decoding capability (4 x 1080p 48 FPS) Compatible with PCIe 3.0/2.0/1.0 x16 lanes, compatible with x8/x4/x2/x1 67 W 169.5 mm x 68.9 mm 319 g 0ºC to 55ºC (32ºF to +131ºF)

The Atlas 300 AI accelerator card (model 3000) supports PCIe 3.0 x16 HHHL half-height half-length standard interfaces (single-slot), the maximum power consumption of 67 W, power consumption and out-of-band management, and H.264 and H.265 video compression and decompression.

6.3.2.4 Atlas 500 AI Edge Station The Atlas 500 AI edge station has two models: 3000 and 3010. The two models differ in CPU architectures. This section describes the general functions of the two models. The Atlas 500 AI edge station is a lightweight edge device designed for a wide range of edge applications. It features powerful computing performance, large-capacity storage, flexible

06 AI Computing Platform Atlas (Textbook)

239

Huawei Atlas Computing Platform

Page 38

configuration, small size, wide temperature range, strong environment adaptability, and easy maintenance and management. Unlocking powerful performance, the Atlas 500 AI Edge Station is designed for real-time data processing at the edge. A single device can provide 16 TOPS of INT8 processing capability with ultra-low power consumption. The Atlas 500 AI edge station integrates Wi-Fi and LTE wireless data interfaces to support flexible network access and data transmission schemes. It is also the industry's first edge computing product to apply the Thermo-Electric Cooling (TEC) technology, enabling it to work excellently even in harsh deployment environments. The device operates stably under extreme temperatures. Figure 6-31 shows the logical architecture of the Atlas 500 AI edge station.

Figure 6-31 Logical architecture of the Atlas 500 AI edge station The Atlas 500 AI edge station features ease of use in edge scenarios and 16-channel video analysis and storage capability. 

Ease of use in edge scenarios 1)

Real time: Data is processed locally and response is returned in real time.

2)

Low bandwidth: Only necessary data is transmitted to the cloud.

3)

Privacy protection: Customers can determine the data to be transmitted to the cloud and stored locally. All information transmitted to the cloud can be encrypted.

4)

Standard container engines and fast deployment of third-party algorithms and applications are supported.

06 AI Computing Platform Atlas (Textbook)

240

Huawei Atlas Computing Platform



Page 39

16-Channel video analysis and storage capability 1)

16-channel video analysis (up to 16-channel 1080p video decoding and 16 TOPS computing power on INT8 data)

2)

12 TB storage capacity, supporting storage of 16-channel 1080p 4 Mbit/s videos for 7 days and 8-channel 1080p 4 Mbit/s videos for 30 days.

, analysis, and data storage application scenarios, including safe city, smart security supervision, smart transportation, smart manufacturing, smart retail, and smart care. It can be deployed in various edge and central equipment rooms, meeting application requirements in complex environments, such as public security departments, communities, campuses, shopping malls, and supermarkets, as shown in Figure 6-32. In these application scenarios, the typical architecture is as follows: Device: IP cameras or other front-end devices are connected in a wireless or wired way. Edge: The edge implements the extraction, storage, and upload of valuable information. Cloud: Data centers implement model and application push, management, and development, as shown in Figure 6-33. Table 6-3 lists the product specifications of Atlas 500 AI edge station.

Figure 6-32 Application scenarios of the Atlas 500 AI edge station

Figure 6-33 Typical architecture of the Atlas 500 AI edge station

06 AI Computing Platform Atlas (Textbook)

241

Huawei Atlas Computing Platform

Page 40

Table 6-3 Product specifications of the Atlas 500 AI edge station Parameter Model AI processor

Model Atlas 500 1 built-in Atlas 200 AI accelerator module, providing 16 TOPS INT8 computing power 16-channel HD video decoding

Network RF wireless module Display Audio

2 x 100 Mbit/s, 1000 Mbit/s adaptive Ethernet ports Either 3G/4G or Wi-Fi module; dual antennas 1 HDMI port 1 audio input port and 1 audio output port (3.5 mm stereo ports)

Power supply

12 V DC, with an external power adapter

Temperature

-40ºC to +70ºC (-40ºF to +158ºF), subject to configuration

6.3.2.5 Atlas 800 Inference Server 

Atlas 800 AI server (model 3000) The Atlas 800 AI server (model 3000) is a data center server based on Huawei Kunpeng 920 processors. It supports eight Atlas 300 AI accelerator cards (model 3000) to provide powerful real-time inference capabilities, making it ideal for AI inference scenarios. It features high-performance computing, large-capacity storage, low power consumption, easy management, and easy deployment, supercharging various fields such as the Internet, distributed storage, cloud computing, big data, and enterprise services. The Atlas 800 AI server (model 3000) has the following features:

1.

It supports server-oriented 64-bit high-performance multi-core Kunpeng 920 processors developed by Huawei, which integrate DDR4, PCIe 4.0, GE, 10GE, and 25GE ports and provide the system-on-chip (SoC) function.



A maximum of eight Atlas 300 AI accelerator cards (model 3000), providing powerful real-time inference capabilities.



A maximum of 64 cores and 3.0 GHz frequency, allowing for flexible configurations of the core quantity and frequency.



Compatible with the ARMv8-A architecture and supports ARMv8.1 and ARMv8.2 extensions.



Uses Huawei 64-bit TaiShan cores.

06 AI Computing Platform Atlas (Textbook)

242

Huawei Atlas Computing Platform

Page 41



64 KB L1 instruction cache, 64 KB L1 data cache, and 512 KB L2 data cache in each core.



Up to 45.5 MB to 46 MB L3 cache capacity.



Supports superscalar, variable-length, and out-of-order pipelines.



One-bit and two-bit error checking and correction (ECC).



Uses the high-speed Hydra interface with a channel rate of up to 30 Gbit/s for interchip communication.



A maximum of eight DDR controllers.



Supports up to eight physical Ethernet ports.



Three PCIe controllers, which support PCIe 4.0 (16 Gbit/s) and are backwards compatible.



IMU maintenance engine that collects the CPU status information.

2.

A single server supports up to two processors and 128 cores, maximizing the concurrent execution of multithreaded applications.

3.

It supports up to thirty-two 2933 MHz DDR4 ECC RDIMMs, which provide a maximum of 4096 GB memory capacity.

Figure 6-34 shows the logical architecture of the Atlas 800 AI server (model 3000). The features are as follows: 1.

The server uses two Huawei Kunpeng 920 processors, and each processor supports 16 DDR4 DIMMs.

2.

The two CPUs are interconnected through two Hydra buses, which provide a maximum transmission rate of 30 Gbit/s.

3.

The Ethernet flexible cards can be cards with four GE or 25GE ports, and are connected to CPUs through high-speed SerDes interfaces.

4.

The screw-in RAID controller card connects to CPU 1 through PCIe buses, and connects to the drive backplane through SAS signal cables. A variety of drive backplanes are available to support flexible drive configurations.

5.

The iBMC uses the Huawei Hi1710 and provides a VGA port, management network port, and debugging serial port.

06 AI Computing Platform Atlas (Textbook)

243

Huawei Atlas Computing Platform

Page 42

Figure 6-34 Logical architecture of the Atlas 800 AI server (model 3000) The Atlas 800 AI server (model 3000) is an efficient inference platform based on Kunpeng processors. Table 6-4 describes its product specifications.

Table 6-4 Product specifications of the Atlas 800 AI server (model 3000) Model Form factor Processor

06 AI Computing Platform Atlas (Textbook)

Atlas 800 AI Server (Model 3000) 2U rack server Two Kunpeng 920 processors with 64

244

Huawei Atlas Computing Platform

Model

Page 43

Atlas 800 AI Server (Model 3000) cores, 48 cores, or 32 cores at a frequency of 2.6 GHz. Two Hydra links, each supporting a maximum speed of 30 Gbit/s. An L3 cache capacity of 45.5 MB to 46 MB. A CPU thermal design power (TDP) of 138 W to 195 W.

AI accelerator card

Up to 8 Atlas 300 AI accelerator cards Maximum number of slots: 32 DDR4 slots supporting RDIMMs

DIMM slot

Maximum memory speed up to 2933 MT/s Memory protection functions: ECC, SEC/DED, SDDC, and patrol scrubbing The capacity of a single DIMM can be 16 GB, 32 GB, 64 GB, and 128 GB. 25 x 2.5-inch drive configuration

Local storage

12 x 3.5-inch drive configuration 8 x 2.5-inch SAS/SATA drives and 12 x 2.5inch NVMe SSDs RAID 0, 1, 5, 6, 10, 50, and 60.

RAID controller card

FlexIO card

Supports a supercapacitor for power failure protection. A board supports a maximum of two FlexIO cards. A single FlexIO card provides the following network ports: Four GE electrical ports supporting PXE Four 25GE or 10GE optical ports, supporting PXE

PCIe expansion

Supports a maximum of nine PCIe 4.0 slots, among which one is a PCIe slot dedicated for a screw-in RAID controller card, and the other eight are for PCIe cards. The specifications of PCIe 4.0 slots are as follows: I/O modules 1 and 2 provide the following PCIe slots: Two standard full-height full-length

06 AI Computing Platform Atlas (Textbook)

245

Huawei Atlas Computing Platform

Model

Page 44

Atlas 800 AI Server (Model 3000) (FHFL) PCIe 4.0 x16 slots (width: PCIe 4.0 x8) and one standard full-height halflength (FHHL) PCIe 4.0 x16 slot (width: PCIe 4.0 x8) One standard FHFL PCIe 4.0 x 16 slot and one standard FHHL PCIe 4.0 x 16 slot (signal: PCIe 4.0 x 8) I/O module 3 provides the following PCIe slots: Two standard half-height half-length PCIe 4.0 x16 slots (width: PCIe 4.0 x8) One standard half-height half-length PCIe 4.0 x16 slot The PCIe slots support Huawei PCIe SSD cards to bolster I/O performance for applications such as searching, caching, and download services. The PCIe slots support Huawei-developed Atlas 300 AI accelerator cards to implement fast and efficient processing and inference, and image identification and processing.

Power supply

2 x 1500 W or 2000 W hot-swappable AC PSUs, supporting 1 + 1 redundancy

Power supply

100 V AC to 240 V AC, or 240 V DC

Fan module

4 hot-swappable fan modules, supporting N + 1 redundancy

Temperature

5ºC to 40ºC

Dimensions (H x W x D)



447 mm x 790 mm x 86.1 mm

Atlas 800 AI server (model 3010) The Atlas 800 inference server (model 3010) is an inference platform based on Intel processors. It supports a maximum of seven Atlas 300 or NVIDIA T4 AI accelerator cards and up to 448-channel HD video analytics in real time, making it ideal for AI inference scenarios. The Atlas 800 inference server (model 3010) combines low power consumption with high scalability and reliability, and easy deployment and management. Figure 6-35 shows the logical architecture of the Atlas 800 AI server (model 3010).

06 AI Computing Platform Atlas (Textbook)

246

Huawei Atlas Computing Platform

Page 45

Figure 6-35 Logical architecture of the Atlas 800 AI server (model 3010) The Atlas 800 AI server (model 3010) has the following features: 1.

The server supports one or two Intel® Xeon® Scalable processors.

2.

It supports 24 DIMMs.

3.

The CPUs (processors) interconnect with each other through two UltraPath Interconnect (UPI) buses at a speed of up to 10.4 GT/s.

4.

The CPUs connect to three PCIe riser cards through PCIe buses and the riser cards provide various PCIe slots.

5.

The screw-in RAID controller card on the mainboard connects to CPU 1 through PCIe buses, and connects to the drive backplane through SAS signal cables. A variety of drive backplanes are provided to support different local storage configurations.

6.

The LBG-2 Platform Controller Hub (PCH) supports: Two 10GE optical LOM ports (on the PCH) or two 10GE electrical LOM ports (on the X557 PHY) Two GE electrical LOM ports

7.

The server uses Hi1710 management chip and supports a video graphic array (VGA) port, a management network port, and a debug serial port. The Atlas 800 AI server (model 3010) is a flexible AI inference platform powered by Intel processors. Table 6-5 lists the product specifications.

06 AI Computing Platform Atlas (Textbook)

247

Huawei Atlas Computing Platform

Page 46

Table 6-5 Product specifications of the Atlas 800 AI server (model 3010) Model Form factor

Atlas 800 AI Server (Model 3010) 2U rack server

Processor

1 or 2 Intel® Xeon® Skylake or Cascade Lake Scalable processors, 205 W TDP

AI accelerator card

Maximum of seven Atlas 300 or NVIDIA T4 AI accelerator cards

Memory

24 DDR4 DIMM slots, up to 2933 MT/s Supports the following disk configurations: 8 x 2.5-inch drive configuration

Local storage

12 x 3.5-inch drive configuration 20 x 2.5-inch drive configuration 24 x 2.5-inch drive configuration 25 x 2.5-inch drive configuration Flash storage: 2 x M.2 SSDs

RAID controller card

Supports RAID 0, 1, 10, 1E, 5, 50, 6, or 60 and supercapacitor for protecting cache data from power failures, and provides RAID-level migration, disk roaming, selfdiagnosis, and web-based remote configuration. LOM: 2 x 10GE + 2 x GE ports

Network

PCIe expansion

Fan module

Flexible NIC: 2 x GE, 4 x GE, 2 x 10GE, or 1/2 x 56G FDR IB ports Up to 10 PCIe 3.0 slots, including 1 for a RAID controller card and 1 for a flexible NIC. 4 hot-swappable fan modules, supporting N+1 redundancy 2 hot-swappable PSUs with 1+1 redundancy. Supported options include:

Power supply

• 550 W AC Platinum PSUs, 900 W AC Platinum/Titanium PSUs, 1500 W AC Platinum PSUs • 1500 W 380 V HVDC PSUs, 1200 W -48 V to -60 V DC PSUs

06 AI Computing Platform Atlas (Textbook)

248

Huawei Atlas Computing Platform

Model Operating temperature

Dimensions (H x W x D)

Page 47

Atlas 800 AI Server (Model 3010) 5ºC to 45ºC Chassis with 3.5-inch hard drives: 86.1 mm x 447 mm x 748 mm (3.39 in. x 17.60 in. x 29.45 in.) Chassis with 2.5-inch hard drives: 86.1 mm x 447 mm x 708 mm (3.39 in. x 17.60 in. x 27.87 in.)

6.3.3 Atlas Accelerates AI Training 6.3.3.1 Atlas 300T AI Training Card: the Most Powerful AI Training Card Huawei Atlas 300T AI training card (model 9000) is developed based on the latest HiSilicon Ascend 910 AI processor. A single card provides up to 256 TOPS FP16 AI computing power for data center training scenarios. It is the most powerful AI accelerator card in the industry, and can be widely used in various general-purpose servers in data centers. It provides customers with AI solutions with optimal performance, high energy efficiency, and low TCO. Huawei Atlas 300 accelerator card (model 9000) is powered by the Ascend 910 AI processors. It has the following features: 

PCIe 4.0 x16 full-height 3/4-length standard interface (dual-slot)



Maximum power consumption: 350 W



Power consumption and out-of-band management



H.264 and H.265 video compression and decompression



Huawei MindSpore and TensorFlow training frameworks



x86-based Linux OS



Arm-based Linux OS

Table 6-6 lists the product specifications of the Atlas 300 accelerator card (model 9000).

Table 6-6 Product specifications of the Atlas 300 accelerator card (model 9000) Model Form factor Memory Computing power PCIe port

06 AI Computing Platform Atlas (Textbook)

Atlas 300 AI Accelerator Card (Model 9000) Full-height 3/4 length PCIe card 32 GB HBM + 16 GB built-in memory 256 TFLOPS FP16 PCIe 4.0 x16

249

Huawei Atlas Computing Platform

Page 48

The computing power of a single Atlas 300 AI accelerator card (model 9000) is improved by two times, and the gradient synchronization latency is reduced by 70%. Figure 6-36 shows the test comparison between the mainstream training card with TensorFlow framework and Huawei Ascend 910 with MindSpore framework. ResNet 50 V1.5 is used to perform tests on the ImageNet 2012 dataset in optimal batch size speculatively mode. It shows that the training speed is much higher when Huawei Ascend 910 and MindSpore framework is used.

Figure 6-36 Speed comparison between Huawei Ascend 910+MindSpore and other modes

6.3.3.2 Atlas 800 AI Training Server: Industry's Most Powerful Server for AI Training Atlas 800 AI training server (model 9000) is mainly used in AI training scenarios. It features superb performance and builds an AI computing platform of high efficiency and low power consumption for training scenarios. It supports multiple Atlas 300 AI accelerator cards or onboard accelerator modules. It is mainly used in various scenarios such as video analysis and deep learning training. Based on the Ascend 910 processor, the Atlas 800 AI server (model 9000) improves the computing density by 2.5 times, hardware decoding capability by 25 times, and energy efficiency ratio by 1.8 times. The Atlas 800 AI server (model 9000) has the highest computing density: up to 2P FLOPS FP16 in a 4U space. It supports flexible configurations and adaptive to multiple loads: supporting SAS/SATA/NVMe/M.2 SSDs. It provides a variety of network ports, including LOMs and FlexIO cards. Table 6-7 lists the product specifications of the Atlas 800 AI server (model 9000).

06 AI Computing Platform Atlas (Textbook)

250

Huawei Atlas Computing Platform

Page 49

Table 6-7 Product specifications of the Atlas 800 AI server (model 9000) Model Form factor Processor Computing power Encoding/Decoding capability Heat dissipation Power consumption

Atlas 800 AI Server (Model 9000) 4U rack server 4 Kunpeng 920 processors 2 PFLOPS FP16 32 built-in hardware decoders Parallel processing with training It supports air cooling and liquid cooling. 2 PFLOPS/5.6 kW

6.3.3.3 Atlas 900 AI Cluster: the World's Fastest Cluster for AI Training Representing the pinnacle of computing power, the Atlas 900 AI cluster consists of thousands of Ascend 910 AI Processors. It integrates the HCCS, PCIe 4.0, and 100G RoCE high-speed interfaces through Huawei cluster communication library and job scheduling platform, fully unlocking the powerful performance of Ascend 910. It delivers 256 to 1024 PFLOPS FP16, a performance equivalent to 500,000 PCs, allowing users to easily train algorithms and datasets for various needs. Test results show that Atlas 900 can complete model training based on ResNet-50 within 60 seconds, 15% faster than the secondranking product, as shown in Figure 6-37. This means faster AI model training with images and speech, more efficient astronomical and oil exploration, weather forecast, and faster time-to-market for autonomous driving.

06 AI Computing Platform Atlas (Textbook)

251

Huawei Atlas Computing Platform

Page 50

Figure 6-37 Speed comparison between the Atlas 900 AI cluster and other modes The Atlas 900 AI cluster has the following key features: 

Industry-leading computing power: 256–1024 PFLOPS FP16, interconnecting thousands of Ascend 910 AI processors for the industry's fastest ResNet-50@ImageNet training performance.



Optimal cluster network: Integrates HCCS, PCIe 4.0, and 100G RoCE high-speed interfaces, and vertically integrates the communication library, topology, and low-latency network, achieving the linearity of over 80%.



Ultimate heat dissipation: Supports a hybrid cooling system capable of 50 kW heat dissipation per cabinet, with over 95% liquid cooling, and PUE < 1.1, saving equipment room space by 79% Huawei deploys Atlas 900 on the cloud and launches the HUAWEI CLOUD EI cluster services, making the extraordinary computing power of Atlas 900 readily accessible to its customers in different industries. These services are available to universities and scientific research institutes around the world at an affordable price. They can apply to use these services immediately.

6.3.4 Device-Edge-Cloud Collaboration Enables the Ultimate Development and User Experience Compared with common solutions in the industry, Huawei Atlas AI computing platform has three advantages: unified development, unified O&M, and secure upgrade. In the industry, different development architectures are used on the edge side and the center side. Models cannot flow freely and require secondary development. However, Huawei Atlas uses the unified development architecture based on Da Vinci architecture and CANN, which can be used on the device, edge, and cloud sides with one-time development. Besides, there is no O&M management tool available in the industry, and only APIs are open, so customers need to develop their own tools. Whereas the FusionDirector of Huawei Atlas can manage a maximum of 50,000 nodes, enabling unified management of devices at the data center and edge sides, as well as the remote model push and device upgrade. Generally, there is no encryption and decryption engine in the industry, and models are not encrypted. Huawei Atlas encrypts transmission channels and models to ensure security. Atlas enables device-edge-cloud collaboration, continuous training at the center, and remote model update, as shown in Figure 6-38.

06 AI Computing Platform Atlas (Textbook)

252

Huawei Atlas Computing Platform

Page 51

Figure 6-38 Atlas device-edge-cloud collaboration

6.4 Industry Applications of Atlas This section describes the industry application scenarios of the Atlas AI computing platform, such as power, finance, manufacturing, transportation, and supercomputing.

6.4.1 Electric Power: One-Stop ICT Solutions for Smart Grids Modern society is increasingly dependent on electric power, and the traditional extensive and inefficient energy utilization methods can no longer meet the current requirements. personneed more efficient and reasonable energy supply. The biggest challenge for the electric power industry is how to achieve reliable, economical, efficient, and green grids. With leading ICT technologies, Huawei works with partners to launch full-process intelligent service solutions covering power generation, transmission, transformation, distribution, and consumption. Smart Grids integrate traditional power systems with ICT technologies, including cloud computing, big data, the Internet of Things (IoT), and mobility, in order to achieve comprehensive sensing capability, interconnection and business. For example, the industry's first intelligent unattended inspection replaces the traditional manual inspection, improving the operation efficiency by five times and reducing the system cost by 30%, as shown in Figure 6-39.

Figure 6-39 Intelligent unattended inspection

6.4.2 Smart Finance: Comprehensive Digital Transformation FinTech and digital financial services have penetrated the overall lifestyle of China's citizens, becoming an indispensable part of daily life — not just limited to payments, but also for investing, deposits, and loans. China stands out and becomes the most digitally ready market for financial services. One of the solutions provided by Huawei Atlas AI computing platform for the financial industry is the smart branches for banks. This solution uses advanced access solutions, security protection, and appliance technologies to help build smart bank branches of the next generation.

06 AI Computing Platform Atlas (Textbook)

253

Huawei Atlas Computing Platform

Page 52

Huawei Atlas AI computing platform uses AI to transform finance, helping banks branches achieve intelligent transformation. Precise identification of VIP customers improves the conversion rate of potential customers by 60%. Intelligent authentication based on facial recognition reduces the service processing time by 70%. Customer complaints are reduced by 50% based on customer queuing duration analysis, as shown in Figure 6-40.

Figure 6-40 Smart Finance: Intelligent Transformation of Bank Branches

6.4.3 Smart Manufacturing: Digital Integration of Machines and Thoughts In-depth convergence of the IT technology and the manufacturing industry in the Industry 4.0 era has led to the industrial revolution. Large-scale customization, global collaborative design, and smart factories and Internet of Vehicles based on the cyberphysical system (CPS) are reshaping the industry value chain and breeding new production methods, industry structures, business models, and catalyzing economic growth. Based on cloud computing, big data, and IoT technologies, Huawei works with global partners to help customers in the manufacturing industry reshape the value chain of the manufacturing industry, innovate business models, and create new value. Huawei Atlas AI computing platform helps the production line upgrade intelligently. Machine vision technology is used to replace traditional manual detection. The unstable result, low production efficiency, discontinuous process, and high labor cost of manual detection are transformed into zero missing detection, high production efficiency, cloudedge collaboration, and labor saving, as shown in Figure 6-41.

06 AI Computing Platform Atlas (Textbook)

254

Huawei Atlas Computing Platform

Page 53

Figure 6-41 Cloud-Edge collaboration, intelligent quality inspection

6.4.4 Smart Transportation: Convenient Travel and Smooth Logistics With the acceleration of globalization and urbanization, person have increasing demand for transportation. This requires construction of modern transportation systems that are green, safe, efficient, and smooth. Upholding the concept of "convenient transportation and smooth logistics", Huawei is dedicated to providing innovative transportation solutions such as digital railway, digital urban rail, and smart airport solutions. Based on cloud computing, BIG DATA, IoT, agile network, BYOD, eLTE, GSM-R, and other new ICT technologies, the solutions enhance the ICT development level of the transportation industry and help industry customers optimize transportation services to achieve more convenient journeys, more efficient logistics, smoother urban traffic, and stronger guarantee for transportation. Huawei Atlas AI computing platform helps upgrade the national highway network and implement vehicle-road collaboration, improving the traffic efficiency by five times, as shown in Figure 6-42.

Figure 6-42 Vehicle-Road collaboration, improving traffic efficiency

06 AI Computing Platform Atlas (Textbook)

255

Huawei Atlas Computing Platform

Page 54

6.4.5 Supercomputing: Building a National AI Platform CloudBrain phase II of Peng Cheng Laboratory (PCL) is built based on Atlas 900, the world's fastest training cluster. It has the strongest computing power (E-level AI computing power), optimal cluster network (HCCL communication supports 100 TB/s non-blocking parameter plane networking), and ultimate energy efficiency (AI cluster PUE < 1.1). Atlas helps CloudBrain phase II to build an innovative basic platform for national mission, PCL, as shown in Figure 6-43.

Figure 6-43 Peng Cheng Laboratory (PCL)

6.5 Summary This chapter describes the Huawei Ascend AI Processor and Atlas AI computing solution, including the hardware and software structure of the Ascend AI Processor, inference products and training products related to the Atlas AI computing platform, and Atlas industry application scenarios.

6.6 Quiz 1.

What are the differences between CPUs and GPUs as two types of processors for AI computing?

2.

Da Vinci architecture is developed to improve AI computing capabilities. It is the Ascend AI computing engine and the core of Ascend AI Processors. What are the three components of the Da Vinci architecture?

3.

What are the three types of basic computing resources contained in the computing unit of Da Vinci architecture?

4.

The software stack of Ascend AI Processors consists of four layers and an auxiliary toolchain. What are the four layers? What capabilities are provided by the toolchain?

5.

The neural network software flow of Ascend AI Processors is a bridge between the deep learning framework and Ascend AI Processors. It provides a shortcut for the

06 AI Computing Platform Atlas (Textbook)

256

Huawei Atlas Computing Platform

Page 55

neural network to quickly convert from the original model to the intermediate computing graph, and then to the offline model that is independently executed. The neural network software flow of Ascend AI Processors is used to generate, load, and execute an offline neural network application model. What function modules are included in the neural network software flow? 6.

Ascend AI Processors include Ascend 310 and Ascend 910, both of which are Da Vinci architecture. However, they differ in precision, power consumption, and manufacturing process, leading to differences in their application fields. What are the differences in their application fields?

7.

Products of the Atlas AI computing platform can be applied to model inference and training. Which products are the products applied to inference, and which to training?

8.

Please give examples to describe the application scenarios of the Atlas AI computing platform.

06 AI Computing Platform Atlas (Textbook)

257

Huawei AI Academy Training Materials

AI Development Platform for Smart Devices

Huawei Technologies Co., Ltd.

07 AI Development Platform for Smart Devices (Textbook)

258

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services, and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services, and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees, or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express, or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang, Shenzhen 518129

Website:

http://e.huawei.com

07 AI Development Platform for Smart Devices (Textbook)

259

Huawei Open AI Platform for Smart Devices

Page 1

Contents 7 AI Development Platform for Smart Devices ............................................................................. 2 7.1 HUAWEI HiAI Platform .......................................................................................................................................................... 2 7.1.1 Introduction ............................................................................................................................................................................ 2 7.1.2 Architecture ............................................................................................................................................................................. 3 7.1.3 HUAWEI HiAI Foundation .................................................................................................................................................. 5 7.1.4 HUAWEI HiAI Engine ........................................................................................................................................................... 7 7.1.5 HUAWEI HiAI Service .......................................................................................................................................................... 8 7.2 Developing Applications Based on HUAWEI HiAI Platform ..................................................................................... 8 7.3 HUAWEI HiAI: Some Solutions .......................................................................................................................................... 11 7.3.1 HUAWEI HiAI Helps Deaf and Mute Person ............................................................................................................. 11 7.3.2 HUAWEI HiAI Improves the Visual Experience of Yuanbei Driving Test ........................................................ 13 7.3.3 HUAWEI HiAI Enables Ctrip ............................................................................................................................................ 13 7.3.4 HUAWEI HiAI Enables WPS to Detect and Calibrate Document....................................................................... 14 7.4 Summary ................................................................................................................................................................................... 17 7.5 Quiz ............................................................................................................................................................................................. 18

07 AI Development Platform for Smart Devices (Textbook)

260

Huawei Open AI Platform for Smart Devices

7

Page 2

AI Development Platform for Smart Devices

HUAWEI HiAI is an open AI capability platform for smart devices, which adopts a "chipdevice-cloud" architecture, opening up chip, application, and service capabilities for a fully intelligent ecosystem. It assists developers in delivering a better smart application experience for users, by fully leveraging Huawei's powerful AI processing capabilities.

7.1 HUAWEI HiAI Platform 7.1.1 Introduction At present, consumers are exposed to a large number of AI applications, such as voice assistant, AI photography, and image beautification. The application scenarios are limited. In fact, with the evolution from device-side AI to distributed AI, resource and computing power sharing among multiple devices will greatly expand application scenarios of deviceside AI, further enabling developers to achieve more smart innovations and bringing superb experience to consumers. Based on this background, Huawei launched HiAI 3.0. The evolution of the HiAI platform has experienced the single-device scenario of version 1.0, multi-device scenario of version 2.0, and distributed scenario of version 3.0, as shown in Figure 7-1.

07 AI Development Platform for Smart Devices (Textbook)

261

Huawei Open AI Platform for Smart Devices

Page 3

Figure 7-1 HUAWEI HiAI evolution process HUAWEI HiAI 3.0 was officially released at Software Green Alliance Developer Conference on November 19, 2019, marking the leap from the device-side AI to the distributed AI. HUAWEI HiAI 3.0 will bring an ultimate smart life experience across all scenarios. HUAWEI HiAI 3.0 provides one-time service access and multi-device adaption experience. Users can enjoy services, such as voice assistant and HiBoard on devices including the mobile phones, tablets, smart screens, and smart speakers. The following are two examples of personal training guidance and driving experience. Case 1 exemplifies the personal training guidance. HUAWEI HiAI 3.0 provides the distributed computer vision (CV) and automatic speech recognition (ASR) capabilities, which can help person exercise at home and achieve the effect similar to that under personal training guidance in the gym. The distributed CV can be used to identify key elements of a human body in 3D. User can capture motion postures of multiple angles in real time by using cameras at different locations, and correct the postures by using multiple screens. With the open ASR capability, users can use the smart speaker to control the movement pace through voice interaction and further assist consumers in personal training at home. Case 2 exemplifies the driving experience. As HUAWEI HiAI 3.0 is combined with distributed technologies, users can connect a smartphone to their car, use the camera inside the car to detect driving behavior, and use the AI chip computing power of the smartphone to remind them of dangerous behaviors such as fatigue driving. All these can be performed in an in-vehicle network environment. The lower-latency local data computing helps drivers better protect themselves. Huawei has more than 4,000 HiAI partners, more than 96 million daily active users, and more than 600 billion monthly calls so far.

7.1.2 Architecture The HUAWEI HiAI platform builds a three-layer ecosystem of cloud, device, and chip. It supports various mainstream frontend frameworks on the cloud (Service) side. Various upper-layer service APIs are provided on the device (Engine) side to ensure efficient running on mobile devices. Heterogeneous resources can be flexibly scheduled on the chip (Foundation) side, meeting developers' demand to accelerate neural network model computing and operator computing. In addition, HUAWEI HiAI has a systematic tool chain, comprehensive documents, various APIs, and source code that is easy to use, enabling quick application development. Figure 7-2 shows the architecture of the HUAWEI HiAI mobile computing platform.

07 AI Development Platform for Smart Devices (Textbook)

262

Huawei Open AI Platform for Smart Devices

Page 4

Figure 7-2 Architecture of the HiAI mobile computing platform HiAI is an AI computing platform that is designed for mobile devices. Compared with device-side AI and cloud-side AI, HiAI has three core advantages: higher security, cost effective, and lower latency. HiAI builds a three-layer AI ecosystem: open service capabilities, open application capabilities, and open chip capabilities. This three-layer open platform utilizes features of the chips, devices, and clouds to deliver an extraordinary experience to both users and developers. Figure 7-3 shows the features of each layer: 

Cloud: created once and reused multiple times.



Device: distributed and all-scenario.



Chip: stronger computing power, more operators and frameworks, and smaller models.

Figure 7-3 HiAI three-layer AI ecosystem HiAI can bring the following benefits to applications: real time, ready-to-use, stability, security, and cost effective. HUAWEI HiAI 3.0 features distributed AI enablement across all scenarios. HiAI has a threelayer architecture: cloud, device, and chip. The sub-module corresponding to cloud is HiAI Service, which is used to open service capabilities. HiAI Service pushes services to users based on users' need so that the services can actively find the users. HiAI Service enables users to create a service once and reuse it for multiple times. The sub-module

07 AI Development Platform for Smart Devices (Textbook)

263

Huawei Open AI Platform for Smart Devices

Page 5

corresponding to the device is HiAI Engine. It provides APIs to open AI application capabilities. HiAI Engine can easily integrate multiple AI capabilities into applications, making applications more intelligent and more powerful. HiAI Engine can be used to invoke various algorithms in the HiAI platform and integrate them into applications. For example, APIs in HiAI Engine can be directly invoked to implement image recognition, text recognition,speech recognition, and natural language understanding. HiAI Engine can implement the distributed and all-scenario usage. The chip is a batch of chips, which is mainly based on Huawei's Kirin chips to open chip capabilities. HiAI Foundation, the submodule of HiAI corresponding to the chip, is responsible for providing operators to quickly convert and migrate existing models, and achieving optimal performance through heterogeneous scheduling and network processing unit (NPU) acceleration. The chip provides more operators, stronger computing power, and more frameworks to streamline the model. To migrate some locally developed AI applications to devices, you can use HiAI Foundation to convert models to adapt to the devices. The following introduces the three sub-modules in detail.

7.1.3 HUAWEI HiAI Foundation HiAI Foundation APIs constitute an AI computing library of a mobile computing platform, enabling developers to efficiently compile AI applications that can run on mobile devices. The features are as follows: By leveraging high performance and high precision of Kirin chips, better device-side AI performance will be delivered by more powerful computing power. They support the largest number of operators (over 300) in the industry and more frameworks, greatly improving flexibility and compatibility. The Honghu, Kirin, and AI camera chips enable AI capabilities for more devices. HiAI Foundation APIs are released as a unified binary file. They accelerate the computing of a neural network by using the HiAI heterogeneous computing platform. Currently, these APIs can only run on a Kirin system on a chip (SoC). With HiAI Foundation APIs, developers can focus on developing new AI applications without paying attention to performance tuning for computing. HiAI Foundation APIs are integrated with the Kirin SoC chip, providing a running environment and debugging tool for mobile devices. Developers can run neural network models on mobile devices and invoke the HiAI Foundation APIs to accelerate computing. You can use the default images of mobile devices for integration, development, and validation without installing the HiAI Foundation APIs. HiAI Foundation APIs provide the following two major functions for AI application developers: 

providing commonly used AI APIs, which can run efficiently on mobile devices.



providing an acceleration API that is independent of the processor hardware. With this API, application vendors and developers can accelerate model calculation and operator calculation using the HiAI heterogeneous acceleration system.

HiAI Foundation APIs support the following basic functions: 

Supporting AI model management APIs, including model compilation, loading, running, and destruction interfaces.



Supporting basic operator calculation APIs, including convolution, pooling, and fullconnection interfaces.

07 AI Development Platform for Smart Devices (Textbook)

264

Huawei Open AI Platform for Smart Devices

Page 6

The HiAI Foundation supports dedicated AI instruction sets for neural network model calculation, and can efficiently and concurrently execute more neural network operators with a minimum clock cycle. The HiAI Foundation can compile a variety of neural network operators, such as convolution, pooling, activation, and full-connection operators, into dedicated AI instruction sequences for the NPU in offline mode, with the data and weight rearranged. The instructions and data are then combined together to generate an offline execution model. Furthermore, during offline compilation, cross-layer operators can be fused together (convolution, ReLU, and pooling), in order to reduce the read-write bandwidth of the double data rate (DDR) and thus improve performance. HiAI Foundation can rearrange related data (batch, channel, height, and width) of the neural network model in the most efficient manner. In particular, for channel data of the feature map, the channel-related calculation efficiency will be greatly improved during convolution operations. HiAI Foundation supports sparse model acceleration. The NPU can skip the multiply-add algorithms with a coefficient of zero, which can greatly improve the calculation efficiency and reduce the bandwidth while maintaining the calculation precision. As shown in Figure 7-4, by using compilation tools, a trained neural network model is converted into an offline model that can be efficiently executed on the HiAI Foundation, and output as a binary file, that is, the offline model.

Figure 7-4 Neural network model complied into an offline model Standard neural network models (such as Caffe) are compiled and converted into offline models. The purpose of compilation is to optimize network configurations and generate target files (that is, offline models) after the optimization. Offline models are serially stored in disks. In this way, the neural network can directly use target files obtained after the optimization for faster computing. Figure 7-5 shows that during offline model calculation, the HiAI Foundation loads offline models from files and copies the user input data (such as images) to the HiAI NPU for calculation. User data only needs to be imported from the DDR to the NPU once for each inference during calculation.

07 AI Development Platform for Smart Devices (Textbook)

265

Huawei Open AI Platform for Smart Devices

Page 7

Figure 7-5 Loading and calculating offline models HUAWEI HiAI Foundation supports multiple intelligent platform frameworks, including Caffe and TensorFlow. Third parties need to specify in the API the intelligent platform framework required for computing. Other APIs and parameters do not need to be modified. The HiAI Foundation also supports most models and neural network operators and will be continuously optimized.

7.1.4 HUAWEI HiAI Engine HiAI Engine opens application capabilities and integrates multiple AI capabilities into applications, making applications smarter and more powerful. HiAI Engine 3.0 adds some API identification capabilities, increasing the number of underlying APIs to more than 40. It enables users to directly invoke existing APIs so that developers can focus on service development. To implement functions such as image recognition and voice processing, developers only need to place the integrated APIs in the application. In addition, in HiAI 3.0, APIs such as CV and voice recognition will be distributed, helping developers develop more smart life experience across all scenarios. The open application engines of HiAI include the CV engine, ASR engine, and natural language understanding (NLU) engine. According to the survey results of developers' needs for HiAI capabilities, more than 60% of the survey respondents focus on CV, ASR, and NLU. CV engine simulates the way of human visual system to perceive the surrounding environment and judge, identify, and understand its spatial composition. Its capabilities include image super-resolution, facial recognition, and object recognition. ASR engine converts human voices into text so that computers can further parse and understand them. Its capabilities include speech recognition and speech conversion. NLU engine is combined with the ASR engine to allow computers to understand human voice or text, as well as to communicate or act naturally. Its capabilities include word segmentation, textual entity recognition, sentiment bias analysis, and machine translation. Table 7-1 describes the application scenarios and open engines of HUAWEI HiAI Engine. For details about the APIs, see Appendix.

Table 7-1 HiAI application scenarios and open engines Short Video and Live Streaming Gesture recognition Portrait segmentation Posture recognition Video style Voice control Intelligent depth of field control Image scene recognition

Social Media

Photo categorization Image recognition Image superresolution (SR) Sensitive data recognition

AR

Photo Taking and Retouching

Shopping

Context awareness Voice control Depth estimation Light estimation

Beautification Image enhancement Aesthetics scoring Album generation Photographing by voice Photographing by gesture

QR code scan Direct service delivery and recommenda tion ID card recognition Bank card recognition Visual shopping

07 AI Development Platform for Smart Devices (Textbook)

266

Huawei Open AI Platform for Smart Devices

CV, ASR

CV, NLU

ASR, CV

CV

Page 8

CV

7.1.5 HUAWEI HiAI Service HiAI Service APIs enable developers to reuse services on multiple devices, such as mobile phones, tablets, and large screens, with only one service access, efficiently implementing distribution. HiAI Service APIs can recommend AI applications or services to users in a timely manner so that users can quickly obtain required services. In addition, applications can implement accurate diversion and connection for users. With the help of HiAI Service APIs, each function or content in an application can be split into independent atomic services for push. HiAI Service APIs support precise distribution in multiple scenarios and entrances. HiAI Service APIs also recommend and display related applications based on user habits, search content, and voice instructions at multiple entrances, such as HiBoard, Global Search, HiVoice, HiTouch, and HiVison, contributing to more intelligent and precise marketing of applications to users. HiAI Service APIs intelligently connect person and services, implementing the experience upgrade from "person searching for services" to "services searching for person".

7.2 Developing Applications Based on HUAWEI HiAI Platform HiAI also provides the integrated development environment (IDE) tool to quickly integrate HiAI capabilities, helping developers quickly, conveniently, and efficiently use Huawei EMUI open capabilities. The IDE is extended (provided as a plug-in) based on Android Studio functions, supporting HiAI Engine and HiAI Foundation such as AI model analysis, AI model conversion, service class generation, and AI model market. Drag-and-drop operations are supported for quick and efficient integration. In addition, it provides free remote real device services (more than 3000 AI real devices and 24/7 remote one-click system commissioning). The IDE supports Android Studio 2.3.x and later versions, and the following operating systems: Windows 7, Windows 10, and MacOS 10.12 or MacOS 10.13. If the operating system does not meet the requirements, only the local AI model conversion function is affected. Related functions can be selected based on the actual scenario. For example, if you use the EMUI AI APIs to develop an application, use HUAWEI HiAI Engine. To convert a TensorFlow or Caffe model into a Huawei HiAI model and integrate the model into applications, use HUAWEI HiAI Foundation. A common application can function as a service provider to use HiAI Service. HiAI is perfectly integrated with Android Studio, that is, HiAI can be used as a plug-in of Android Studio, as shown in Figure 7-6.

07 AI Development Platform for Smart Devices (Textbook)

267

Huawei Open AI Platform for Smart Devices

Page 9

Figure 7-6 Integration of HiAI IDE and Android Studio

批注 [s(1]: 需更换为英文版界面图片

The HiAI platform plug-in provides the HiAI Engine and HiAI Foundation functions. HiAI Engine provides APIs integrated with applications. It can be invoked directly. HiAI Foundation integrates trained models, which can be downloaded and used directly, as shown in Figure 7-7.

07 AI Development Platform for Smart Devices (Textbook)

268

Huawei Open AI Platform for Smart Devices

Page 10

Figure 7-7 HiAI functions integrated with Android Studio

批注 [s(2]: 需更换为英文版界面图片

When an application has been developed and entering the real device commissioning, full series of convenient, efficient, and smooth remote debugging services are provided by Huawei. Developers can access the real devices in Huawei remote terminal lab by one click to perform real-time remote control and single-step commissioning. In addition, Huawei also provides performance and log analysis. Figure 7-8 shows some of the supported Huawei models.

Figure 7-8 Huawei models supported by HiAI

批注 [s(3]: 需更换为英文版图片

The procedure for integrating the HiAI deep learning development kit (DDK) is as follows: Obtain the trained Caffe or TensorFlow model and use the offline model generator (OMG) conversion tool to convert the original model of the open-source framework into the offline

07 AI Development Platform for Smart Devices (Textbook)

269

Huawei Open AI Platform for Smart Devices

Page 11

model (OM) suitable for the Da Vinci platform. The OM model can contain the 8-bit quantization function. Finally, integrate applications, including model preprocessing and model inference, as shown in Figure 7-9. The procedure of application integration is as follows: Step 1

Create a project ① In Android Studio, create a project. Make sure Include C++ support is selected. ② Select C++11 from the C++ Standard drop-down list box. Select Exceptions Support (fexceptions) and Runtime Type Information Support (-frtti). Step 2 Compile Java Native Interface (JNI) ① Implement JNI and compile the Android.mk file. ② Compile the Application.mk file and copy the SDK .so files to the resource library. ③ Specify the NDK C++ Compilation File in the build.gradle file. Step 3 Model Integration ① Model preprocessing: application-layer model preprocessing and JNI-layer model preprocessing ② Model inference

Figure 7-9 HiAI DDK integration process ----End

7.3 HUAWEI HiAI: Some Solutions 7.3.1 HUAWEI HiAI Helps Deaf and Mute Person Children with hearing disabilities cannot enjoy a good time because of physical obstacles. They cannot hear the greetings from their families and friends, nor can they read the words in the books. The world is silent and lonely to them. There are about 32 million deaf

07 AI Development Platform for Smart Devices (Textbook)

270

Huawei Open AI Platform for Smart Devices

Page 12

children around the world. They cannot hear the wonderful voice or verbally express their ideas. The way they communicate with the world is full of challenges. The reality is cruel that 90% of children with hearing disabilities have healthy parents, 78% of whom are unable to communicate with children. For children who are severely or profoundly deaf, learning and reading can be an overwhelming challenge. Languages are learned by listening, speaking, reading, and writing. Listening is a key part for language learning. When encountering a strange word, a normal child can understand its meaning with their family's explanation, and master it by speaking the word continuously. Children with hearing disabilities, however, can only learn languages through sign language. Without the help of professional sign language teachers, they cannot communicate with normal person. To address this issue, Huawei developed StorySign in partnership with nonprofit European Union of the Deaf, the publisher Penguin Random House and animation gurus Aardman. With Huawei HiAI's open image recognition and Optical Character Recognition (OCR) capabilities, animation effect can be displayed as soon as users hold their smartphones over the words in the physical edition. Then the lovely avatar "Star" appears to translate the text on the book into sign language, as shown in Figure 7-10.

Figure 7-10 HUAWEI HiAI displays texts with animation effect

07 AI Development Platform for Smart Devices (Textbook)

271

Huawei Open AI Platform for Smart Devices

Page 13

7.3.2 HUAWEI HiAI Improves the Visual Experience of Yuanbei Driving Test Yuanbei Driving Test is a driving test application tailored for beginners. It provides driving test services in texts and images, including registering for a driving school, reserving a driving test, and simulating a driving test. Yuanbei Driving Test is committed to building a convenient and practical one-stop driving test platform. The simulated driving test is one of the main features of the Yuanbei Driving Test. It combines pictures, videos, and voices in the built-in installation package, which can help users quickly get familiar with the test content and rules to pass the driving test. The simulated exam questions contain a large number of pictures to assist users in exercises. However, some low-quality images are not clear enough on common mobile phones, affecting users' driving exercises. On most devices, most image optimization programs for simulated driving tests rely on the Internet. Therefore, when the network signal is weak or no network is available, the image quality can be hardly improved. HUAWEI HiAI adopts intelligent noise reduction and can enlarge the image resolution by nine times to significantly improve the image quality, bring clearer image details to users, and improve the visual experience. Based on the device-side learning model of HUAWEI HiAI, the images on the device side are optimized and zoomed in. The same images are displayed more clearly on Huawei NPU models. In addition, it no longer depends on the network condition. Users can still view high-quality large images when the network is unstable or disconnected, as shown in Figure 7-11.

Figure 7-11 Huawei HiAI improves visual experience of Yuanbei Driving Test

批注 [s(4]: 需更换为英文版界面图片

7.3.3 HUAWEI HiAI Enables Ctrip The Ctrip mobile client provides users with comprehensive travel services, including hotel

07 AI Development Platform for Smart Devices (Textbook)

272

Huawei Open AI Platform for Smart Devices

Page 14

reservation, flight ticket reservation, train ticket reservation, travel guides, preferential tickets, and travel insurance. During the journey, users often take many photos, hoping to capture beautiful scenery and keep the pleasant time through cameras. However, it is difficult for most ordinary person to accurately determine the quality of photos taken by themselves due to the lack of professional photography knowledge. As a result, users cannot determine whether the photos are good or not and whether the best effect is achieved. In addition, when the photos taken by users are not clear and sharp enough and the display effect is poor, image quality improvement becomes the requirement of many users. By integrating the aesthetic rating capability of HUAWEI HiAI Engine, it can automatically integrate technical factors such as out-of-focus and jitter as well as subjective aesthetics such as skewness, color, and image composition to score images. Users can quickly understand the image quality based on the scores and adjust the image quality accordingly to get the most beautiful scenery. In addition, with the help of the HUAWEI HiAI, the application can be woken up by voice and make poems with just one click, as shown in Figure 7-12.

Figure 7-12 HUAWEI HiAI enables Ctrip to make poems with one click

批注 [s(5]: 需更换为英文版图片

7.3.4 HUAWEI HiAI Enables WPS to Detect and Calibrate Document WPS is an office software application that allows users to edit and view common office documents, such as texts, tables, and presentations. WPS also provides users with free cloud

07 AI Development Platform for Smart Devices (Textbook)

273

Huawei Open AI Platform for Smart Devices

Page 15

space and document templates. With the emergence and development of mobile terminals, mobile phones are more and more used for office work such as editing documents and sending and receiving emails. However, without a keyboard or mouse, users can only operate by tapping and dragging the phone screen with their finger, which makes working with mobile phone extremely inefficient. For example, when attending a class, meeting, or training session and need to record useful information on PowerPoint slides, person often take out our phone and take a photo. However, images captured usually have some problems and must be exported to a computer and processed before being made into a PowerPoint document, which can be very time-consuming. The problems are as follows: 

Interference of other objects: In addition to content on the PowerPoint slides, other objects such as the screen, walls, desks, and chairs may also be captured in the image, so the image must be tailored before being used.



Document deformation: If the image is not captured right in front of the slides, the document on the image may be distorted in different degrees. A stretched or compressed image will affect subsequent use.



Blurred image: Limited by factors such as light and distance, the image captured may be blurred, affecting perception and information recognition.

Uneditable content: Many users may need to edit the content on the PowerPoint slides when viewing the images captured but the image content cannot be directly edited. With access to HUAWEI HiAI and enhanced by the remarkable performance of Huawei Kirin 970, WPS needs only three seconds to generate a PowerPoint file based on multiple images with one touch, solving all the preceding problems. 

Document sensing to automatically identify the useful area: After integrating the document detection and calibration ability of HiAI Engine, WPS can accurately identify the area where the document is and automatically crop other objects including the screen, walls, desks, and chairs, as shown in Figure 7-13.

Figure 7-13 WPS document sensing and automatic identification 

Document calibration to quickly adjust the shooting angle to the center of view: This is an enhanced auxiliary function for document rephotographing. It can automatically adjust the shooting angle to the right front of the document, allowing a maximum adjustment range of 45 degrees, as shown in Figure 7-14.

07 AI Development Platform for Smart Devices (Textbook)

274

Huawei Open AI Platform for Smart Devices

Page 16

Figure 7-14 WPS document calibration 

批注 [s(6]: 需更换为英文版图片

Document super-resolution to make texts on the document clearer: HUAWEI HiAI amplifies the images that contain text content nine times (three times in the height and width respectively) the resolution to make the images clearer so that text recognition will be easier, as shown in Figure 7-15.

Figure 7-15 WPS text super-resolution 

Access to OCR to edit the image content: By integrating OCR, WPS can automatically recognize and extract the text in images so that users can modify, cut, copy, or delete texts in the PowerPoint, as shown in Figure 7-16.

07 AI Development Platform for Smart Devices (Textbook)

275

Huawei Open AI Platform for Smart Devices

Page 17

Figure 7-16 WPS OCR recognition For more solutions, visit the https://developer.huawei.com/consumer/en/hiai

official

批注 [s(7]: 需更换为英文版界面图片

HiAI

website:

7.4 Summary This chapter describes the three-layer architecture of the HUAWEI HiAI platform: HUAWEI HiAI Foundation, HUAWEI HiAI Engine and HUAWEI HiAI Service APIs so that service capability openness, application capability openness, and chip capability openness can be implemented. These related capabilities and some HiAI solutions are also introduced in this chapter. Last but not least, HUAWEI HiAI has carried out the following events to fully connect developers, encourage innovation, and achieve win-win ecosystem. 

Offline connection for in-depth communication: ① salon city station ② HiAI open courses ③ special technical conferences



1 billion investment, stimulating innovations in all scenarios: ① openness and innovation of device capabilities ② all-scenario digital service innovation ③ cloud service ecosystem co-construction



Innovation competitions for continuous development: ① AI Application Innovation Contest ② Future Application Creative Contest ③ AR Application Innovation Contest

07 AI Development Platform for Smart Devices (Textbook)

276

Huawei Open AI Platform for Smart Devices

Page 18

Huawei believes that AI can make life better by bringing unprecedented convenience for both back end and devices. However, this requires actual application scenarios that allow more enterprises and developers to play a part in improving user experience substantially. Huawei is willing to work with partners to jointly promote intelligent transformation of industries with more developers and enterprises based on the HiAI 3.0 platform.

7.5 Quiz 1.

2. 3. 4. 5.

HUAWEI HiAI 3.0 was officially released at Software Green Alliance Developer Conference on November 19, 2019, marking the leap from the device-side AI to the distributed AI, which will bring ultimate smart life experience across all scenarios. What is the three-layer AI ecosystem of HUAWEI HiAI? Which layer can convert a standard neural network model into an offline model? Which layer can easily integrate multiple AI capabilities into applications to make applications more intelligent and powerful? HiAI aims to help developers quickly, conveniently, and efficiently use Huawei EMUI open capabilities. Which tool can be integrated with HiAI? What is the procedure of application integration?

07 AI Development Platform for Smart Devices (Textbook)

277

Huawei AI Academy Training Materials

Enterprise Smart Application Platform

Huawei Technologies Co., Ltd.

08 Enterprise Smart Application Platform (Textbook)

278

Copyright © Huawei Technologies Co., Ltd. 2020. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied. The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd. Address:

Huawei Industrial Base Bantian, Longgang Shenzhen 518129

Website:

https://e.huawei.com/

08 Enterprise Smart Application Platform (Textbook)

279

HUAWEI CLOUD Enterprise Smart Application Platform

Page 1

Contents 8 Enterprise Smart Application Platform (EI) ............................................................................. 2 8.1 EI Products and Services ....................................................................................................................................................... 2 8.1.1 Overview .................................................................................................................................................................................. 2 8.1.2 HUAWEI CLOUD EI Intelligent Twins............................................................................................................................. 3 8.1.3 Industrial Intelligent Twins .................................................................................................................................................. 4 8.1.4 Campus Intelligent Twins .................................................................................................................................................... 5 8.1.5 Network AI Engine (NAIE) .................................................................................................................................................. 6 8.1.6 EI Essential Platform: Huawei HiLens ............................................................................................................................ 7 8.1.7 Advantages of Huawei HiLens .......................................................................................................................................... 8 8.1.8 Application Fields of HiLens ............................................................................................................................................... 9 8.1.9 EI Essential Platform: Graph Engine Service (GES) ...............................................................................................10 8.1.10 Other EI Products and Services ...................................................................................................................................12 8.2 ModelArts ...................................................................................................................................................................................19 8.2.1 ModelArts Functions ...........................................................................................................................................................19 8.2.2 ModelArts Architecture and Application .......................................................................................................................20 8.2.3 ModelArts Highlights ..........................................................................................................................................................21 8.2.4 How to Access ModelArts .................................................................................................................................................22 8.2.5 How to Use ModelArts .......................................................................................................................................................22 8.3 HUAWEI CLOUD EI Solutions ............................................................................................................................................23 8.3.1 Case: OCR Implements Full-Process Automation for Reimbursement Through Invoices ..........................23 8.3.2 Case: Intelligent Logistics with OCR .............................................................................................................................24 8.3.3 CBS ..........................................................................................................................................................................................25 8.3.4 Case: Intelligent Q&A of Enterprises in a Certain District ......................................................................................27 8.3.5 Case: Gene Knowledge Graph .......................................................................................................................................27 8.3.6 Policy Query Based on Knowledge Graphs................................................................................................................28 8.3.7 Case: Smart Campus .........................................................................................................................................................28 8.3.8 Case: Crowd Statistics and Heat Map ..........................................................................................................................29 8.3.9 Case: Vehicle Recognition ...............................................................................................................................................30 8.3.10 Case: Intrusion Detection ...............................................................................................................................................31 8.3.11 Cognitive Computing Platform of China National Petroleum Corporation — Oil and Gas Layer Identification in Well Logging ......................................................................................................................................................32 8.4 Summary ....................................................................................................................................................................................33 8.5 Quiz..............................................................................................................................................................................................33

08 Enterprise Smart Application Platform (Textbook)

280

HUAWEI CLOUD Enterprise Smart Application Platform

8

Page 2

Enterprise Smart Application Platform (EI)

This course describes the HUAWEI CLOUD Enterprise Smart (EI), including the EI products, services, and solutions. In particular, this course focuses on the Huawei ModelArts platform.

8.1 EI Products and Services 8.1.1 Overview The following figure shows the HUAWEI CLOUD EI products and services, including big data, essential platform, Conversational Bot, natural language processing (NLP), automatic speech, video analytics, image recognition, content moderation, ImageSearch, face and human recognition, optical character recognition (OCR), and EI Intelligent Twins.

Figure 8-1 EI Products and Services 

Big data: data ingestion, cloud data migration, cloud stream, MapReduce, data lake insight, and CloudTable



Essential platform: ModelArts, deep learning, machine learning, graph engine, video ingestion services, and HiLens



Conversational Bot: Question-Answering bot, task-oriented conversational bot, speech analytics, and CBS customization

08 Enterprise Smart Application Platform (Textbook)

281

HUAWEI CLOUD Enterprise Smart Application Platform

Page 3



Natural language processing: natural language processing fundamentals, moderation (text), language understanding, language generation, customized natural language processing, and machine translation



Automatic speech: automatic speech recognition, speech synthesis, and text-to-speech



Video analytics: video content recognition, editing, and tagging, and video quality detection



Image recognition: image tagging and celebrity recognition



Content moderation: moderation of texts, images, and videos



ImageSearch: reverse image search, allowing users to search for the same or similar images in a specified image library



Face and human recognition: and human analysis



OCR: general, card, receipt, domain, and custom OCR



EI Intelligent Twins: Traffic Intelligent Twins, Industrial Intelligent Twins, Campus Intelligent Twins, Vehicle Intelligent Twins, Network AI Engine (NAIE), EIHealth, and GeoGenius

8.1.2 HUAWEI CLOUD EI Intelligent Twins 8.1.2.1 Overview The EI Intelligent Twins integrates AI technologies into various industry scenarios, fully taps into the data value, and draws on the advantages of AI technologies to build a scenariobased solution for higher efficiency and better user experience. Figure 8-2 shows the EI Intelligent Twins, which consists of Traffic, Industrial, Campus, and Vehicle Intelligent Twins. In addition, Huawei has launched the Network AI Engine (NAIE), EIHealth, and GeoGenius.

Figure 8-2 EI Intelligent Twins

8.1.2.2 Traffic Intelligent Twins (TrafficGo) The Traffic Intelligent Twins (TrafficGo) supports a broad array of functions, such as the 24/7 and all-area traffic condition overseeing, traffic incident detection, regional traffic light control and coordination in real time, large-screen display of traffic conditions, and key vehicle management. TrafficGo delivers efficient, environment-friendly, and safe travel experience, as shown in Figure 8-3.

08 Enterprise Smart Application Platform (Textbook)

282

HUAWEI CLOUD Enterprise Smart Application Platform

Page 4

Figure 8-3 Traffic Intelligent Twins (TrafficGo) TrafficGo boasts the following advantages: 

Integrates large amounts of data from the Internet and the transportation industry for deep data mining.



Implements all-domain and human-vehicle collaboration to maximize the traffic volume and minimize the waiting time of vehicles in each area. Coordinates travel requirements of vehicles and pedestrians for smooth traffic.



Supports real-time traffic light coordination. Huawei is the industry's first vendor that develops the secure communication interface standards for Traffic Intelligent Twins and signal control platforms.



Accurately predicts vehicle trajectories and plans for the optimal travel route.

TrafficGo has the following features: 

24/7 traffic incident overseeing in all areas



Cross-area collaboration for intelligent traffic light optimization



Identification of key congestion-prone sites and routes, and impact analysis of traffic congestion



Prediction of crowd density and motion



24/7 access to real-time traffic conditions for informed traffic decision-making



Real-time and on-demand traffic light scheduling



Large-screen display of traffic conditions



Refined management of key vehicles

8.1.3 Industrial Intelligent Twins The Industrial Intelligent Twins adopts big data and AI technologies to provide full-pipeline intelligent services covering design, production, logistics, sales, and service. The cuttingedge Industrial Intelligent Twins helps enterprises tap into data value and build technological advantages. Figure 8-4 shows the Industrial Intelligent Twins.

08 Enterprise Smart Application Platform (Textbook)

283

HUAWEI CLOUD Enterprise Smart Application Platform

Page 5

Figure 8-4 Industrial Intelligent Twins Industrial Intelligent Twins can transform various industries in the following three aspects: 

From manual experience to data-driven smart: Data mining and analytics help obtain experience for improving efficiency and product quality.



From digitalization to smart: Intelligent analysis has become a new engine that drives the digital transformation of enterprises.



From production to innovation: Enterprises create competitive edges by collaborating data of product design and sales, as well as upstream/downstream data of the industry chain.

Customer benefits of Industrial Intelligent Twins: 

Product quality improvement: Classifies and analyzes a wide range of data, including customer feedback, online comments, competitor information, repair records, and postsales data, to detect critical issues and improve design for better product quality.



Intelligent O&M: According to the historical and current status of the system, uses methods, such as time series prediction, neural network prediction, and regression analysis, to predict whether and when a fault will occur, and what kind of fault will occur. This feature helps enterprises improve the O&M efficiency, reduce the unscheduled downtime, and lower costs for manual O&M.



Production material estimation: Accurately analyzes and estimates materials required for production based on historical data, reducing the warehousing period and improving efficiency. Algorithms based on the industry's time series algorithm model are optimized and tailored for Huawei's supply chain.

8.1.4 Campus Intelligent Twins The Campus Intelligent Twins manages and overseeings industrial, residential, and commercial campuses. It adopts AI technologies such as video analytics and data mining to make our work and life more convenient and efficient.

08 Enterprise Smart Application Platform (Textbook)

284

HUAWEI CLOUD Enterprise Smart Application Platform

Page 6

Figure 8-5 Campus Intelligent Twins The Campus Intelligent Twins transforms campus management in the following three aspects: 

AI technologies are adopted to assist guards in protecting campuses, reducing manual workload and enhancing campus security.



The facial recognition-based access control allows for automatic card-free access.



The strong capabilities of follow-uping and analyzing lost items make employees and property owners feel a sense of security.

Customer benefits of Campus Intelligent Twins: 

Campus access control: The facial detection and recognition technologies are used to identify visitors and quickly return the recognition results, improving the throughput rate of access control and implementing automatic campus management.



Security zone overseeing: Technologies, such as intrusion detection, loitering detection, and abandoned item detection, are used to overseeing controlled areas to ensure the safe life and production in campuses.



Smart parking: The license plate recognition and trajectory follow-uping services enable more efficient management of vehicle entrance and exit, routes, parking violation, and parking space.

8.1.5 Network AI Engine (NAIE) The Network AI Engine (NAIE) empowers smart networks to simplify network services, improve network resource utilization, O&M efficiency, energy efficiency, and service experience, and enable autonomous driving networks. Figure 8-6 shows the NAIE.

08 Enterprise Smart Application Platform (Textbook)

285

HUAWEI CLOUD Enterprise Smart Application Platform

Page 7

Figure 8-6 Network AI Engine (NAIE) The NAIE generates the following business value: 

Predicts network traffic and balances network resources based on the prediction results, improving network resource utilization.



Reduces a large number of repeated work orders and predicts faults for preventive maintenance, boosting network O&M efficiency.



Predicts service status in real time and automatically and dynamically adjusts energy consumption based on the service volume, improving energy efficiency.

The NAIE has the following technical advantages: 

Secure data import to the lake: Various types of data, such as network engineering parameters, performance, and alarm, are quickly collected and imported to the lake. The NAIE leverages a large number of tools to improve data governance efficiency, and security technologies, such as multi-tenant isolation and encrypted storage, to ensure the security of data in the lake throughout the lifecycle.



Abundant network-related experience: The NAIE enables developers to quickly complete model and application development. It supports a wizard-based model development environment that provides multiple AI model development templates in the network domain. The environment provides developers of different levels with services such as training, model generation, and communication model.



Diversified application services: The NAIE provides application services for multiple network service scenarios, such as wireless access, fixed network access, transmission bearer, core network, data center, and energy, improving the O&M efficiency, energy consumption efficiency, and resource utilization of network services.

8.1.6 EI Essential Platform: Huawei HiLens Huawei HiLens is a multimodal AI development platform that enables device-cloud synergy. It provides an easy-to-use framework, out-of-the-box environment, cloud-based management console, and AI skill market. Huawei HiLens allows users to easily develop and deploy visual and auditory AI applications online, and manage a multitude of connected computing devices. It helps users develop multimodal AI applications and deliver them to devices to implement multi-scenario intelligent solutions. Figure 8-7 shows the Huawei HiLens. Huawei HiLens has the following features:

08 Enterprise Smart Application Platform (Textbook)

286

HUAWEI CLOUD Enterprise Smart Application Platform

Page 8



Inference based on device-cloud synergy, combining low computing latency with high precision



Data analytics at the device side, reducing the cloud-based storage costs



One-stop skill development, shortening the development period



Extensive skills in the Skill Market, enabling online training and one-click deployment

Figure 8-7 Huawei HiLens device-cloud synergy

8.1.7 Advantages of Huawei HiLens 

Inference based on device-cloud synergy

Device-cloud model synergy resolves instability in networks and saves the network bandwidth. Devices collaborate with the cloud platform to update models online to quickly improve the device precision. Devices analyze the collected data locally, slashing the data traffic on the cloud and saving storage costs. 

Unified skill development platform

Huawei HiLens supports hardware-software collaboration for optimization, unified skill development framework, encapsulation of basic components, and common deep learning models. 

Cross-platform design

Mainstream processors are supported, including the Ascend series and HiSilicon 35xx series, to cover mainstream overseeing scenarios. Device processors support model conversion and algorithm optimization. 

Extensive skills in the Skill Market

The Skill Market offers extensive skills, such as human and crying detection. Users can select skills as required and quickly deploy them on devices without any development. Algorithms of multiple models in the Skill Market are optimized to resolve issues on devices, such as small memory capacity and low precision. Developers can also use the HiLens management console to develop customized skills and add them to the Skill Market.

08 Enterprise Smart Application Platform (Textbook)

287

HUAWEI CLOUD Enterprise Smart Application Platform

Page 9

8.1.8 Application Fields of HiLens Users of Huawei HiLens can be divided into three types, common users, AI developers, and camera vendors. 

Common users (skill users)

Common users can be family members, shopping mall owners, parking lot management staff, or construction site owners. They want to improve home security, collect statistics on passenger traffic, identify vehicle attributes and license plates, and check whether workers wear safety helmets. Common users can purchase the HiLens Kit and register with the Huawei HiLens console, purchase or customize proper skills in the Skill Market, such as the recognition of license plates and safety helmets, and then install them on the HiLens Kit. 

AI developers (skill developers)

AI developers are usually technical personnel or college students. These users want to generate income or acquire knowledge by developing AI skills and easily deploying them on devices to view the operating effect of the skills in real time. These users can develop AI skills on the HiLens console. HiLens integrates the HiLens framework at the device side, encapsulates basic components, and provides unified APIs to simplify development. After a skill is developed, developers can deploy it to the HiLens Kit in one-click mode and view the operating effect. Developers can also release it to the Skill Market or share it as a template with other users. 

Camera vendors

Cameras equipped with HiSilicon 35xx series processors may have low or even no AI capabilities. As a result, camera vendors intend to make these products smarter to build competitiveness. Huawei HiLens can be applied to the smart surveillance in a wide range of fields, such as homes, campuses, shopping malls, and in-vehicle devices. 

Smart home surveillance

Cameras and homeware (Huawei HiSilicon 35xx series processors) and high-performance HiLens Kit (Huawei Ascend processors based on the Da Vinci architecture) can improve intelligent video analytics capabilities in smart home. These devices are applied to the following scenarios: Human detection: Detects humans and records the time of appearance using home surveillance devices, and sends an alarm to the mobile phone when no family member is at home and strangers are detected. Fall detection for elderly care: Generates an alarm when detecting a person falling down. Baby crying detection: Intelligently identifies baby crying and generates an alarm on the mobile phones of specified users. Vocabulary recognition: Detects customized words, for example, help. When the word is detected, an alarm is generated. Facial attribute detection: Detects facial attributes in a video, such as the gender, age, and smile. This is suitable for gate security protection and video screening. Family album: Collects the detected video clips of a child and arranges them in chronological order in a family album to record the child's growth. 

Smart campus surveillance

The HiLens console delivers AI skills to AI edge stations equipped with Ascend processors, enabling edge devices to process data. This function can be applied to the following scenarios: Facial recognition-based gate: Implements smart gate access control and attendance registration based on the facial recognition technology.

08 Enterprise Smart Application Platform (Textbook)

288

HUAWEI CLOUD Enterprise Smart Application Platform

Page 10

License plate & model recognition: Recognizes license plates and vehicle models at entrances and exits of campus and garage, implementing permission authentication for specified license plates and vehicle models. Safety helmet detection: Detects workers who do not wear safety helmets in the system and generates an alarm on specified devices. Follow-up restoration: Performs collaborative analysis on a face or vehicle recognized by multiple cameras to restore the moving path of a pedestrian or vehicle. Face search: Recognizes specified faces in the campus surveillance system, such as faces of blacklisted personnel. Abnormal sound detection: Reports an alarm when detecting abnormal sound, such as glass breakage and explosion. Intrusion detection: Generates an alarm when a person is detected in a specified surveillance area. 

Smart shopping mall surveillance

Devices used in shopping malls include HiLens Kit, AI edge stations, and commercial cameras. HiLens Kit can be applied to small shopping malls to support 4- to 5-channel video analytics. The cameras are small and can be deployed indoors. The specific application scenarios are as follows: Foot traffic statistics: Uses videos inside the shopping mall to collect statistics on foot traffic at entrances and exits and analyze foot traffic changes in different periods of time. VIP identification: Accurately identifies VIP customers using facial recognition to help develop precision marketing strategies. Statistics on the number of new and old customers: Leverages the facial recognition technology to identify persons at the entrances and exits and collects statistics on the number of new and old customers. Crowd density heatmap: Analyzes the crowd density heatmap to know the crowd density and the popularity of commodities. 

Smart in-vehicle device

The smart in-vehicle device based on the Android system intelligently analyzes the conditions inside and outside a vehicle in real time. It is applicable to driving behavior detection and overseeing of shuttle buses, touring buses, and dangerous cargo carriers. The specific application scenarios are as follows: Facial recognition: Authenticates a driver's permission by checking whether the driver's face match that of the vehicle owner stored in the facial image library. Fatigued driving: Overseeings the driver's physical conditions in real time and intelligently generates an alarm when the driver is fatigued. Posture analysis: Detects the driver's postures that may distract driving, such as making a call, drinking water, looking around, and smoking. Detection of vehicles and pedestrians: Detects vehicles and pedestrians around a vehicle. This function can be used to detect pedestrians in blind zones.

8.1.9 EI Essential Platform: Graph Engine Service (GES) Huawei Graph Engine Service (GES) is the first commercial distributed native graph engine with independent intellectual property rights in China. It facilitates the query and analytics of graph-structure data based on various relationships. GES uses the Huawei-developed high-performance graph engine EYWA as its kernel and has multiple proprietary patents. It is widely used in scenarios with a large amount of relational data, such as social apps, enterprise relationship analysis, logistics distribution,

08 Enterprise Smart Application Platform (Textbook)

289

HUAWEI CLOUD Enterprise Smart Application Platform

Page 11

shuttle bus route planning, enterprise knowledge graph, risk control, recommendation, public opinion, and anti-fraud. Massive and complex relational data, such as social relationships, transaction records, and transportation networks, is naturally the graph-structure data mentioned above. GES is a service that stores, queries, and analyzes graph-structure data based on various relationships. GES plays an important role in scenarios such as social apps, enterprise relationship analysis, logistics distribution, shuttle bus route planning, enterprise knowledge graph, and risk control. In terms of individual analysis, GES can perform user profile analysis on an individual based on the number and characteristics of neighbors of the individual, or identify opinion leaders based on the node characteristics and importance. The influencing factors can be the quantity or quality. On the one hand, when a user has more followers, the user is considered more important. On the other hand, the quality transfer factor based on the transfer characteristics in graphs is also an influencing factor. The quality of followers is transferred to the followee. When the followers are high-quality, the quality of the followee is improved greatly. In terms of group analysis, GES adopts the label propagation algorithm and community detection algorithm to categorize nodes with similar characteristics. This function can be applied to node classification scenarios, such as friend/group recommendation and user grouping. For example, if two persons in a social circle have a same friend, they may become friends in the future. A larger number of mutual friends indicates a stronger relationship between two persons. Therefore, friend recommendation is based on the number of mutual friends. In terms of link analysis, GES uses the link analysis algorithm and relationship prediction algorithm to predict and identify hot topics and highlights, as shown in Figure 8-8.

Figure 8-8 Graph Engine Service (GES) GES is applicable to a broad array of scenarios, and its potential in more industries and scenarios needs to be tapped in the future. GES boasts the following advantages: 

Large-scale query and analytics: Efficient data organization facilitates analytics and query of tens and even hundreds of billions of data records.



High performance: The optimized distributed graph processing engine supports highconcurrency and multi-hop queries in seconds.



Combined query and analytics: The GES offers various graph analytics algorithms that support multiple scenarios, such as relationship analysis, route planning, and precision marketing.

08 Enterprise Smart Application Platform (Textbook)

290

HUAWEI CLOUD Enterprise Smart Application Platform



Page 12

Ease of use: The GES provides a wizard-based GUI and is compatible with Gremlin to facilitate graph analysis.

GES provides the following functions: 

Various domain-specific algorithms: Supports PageRank, k-core, shortest path, label propagation, triangle counting, and association prediction.



Visualized chart analysis: Provides a wizard-based exploration environment to visualize query results.



Diversified APIs: Provides APIs for graph query, metrics statistics, Gremlin query, graph algorithms, graph management, and backup management.



Compatibility with open source ecosystems: Compatible with Apache TinkerPop Gremlin 3.3.0.



Graph management: Supports functions such as overview, graph management, graph backup, and metadata management.

8.1.10 Other EI Products and Services 8.1.10.1 Conversational Bot Service (CBS) Conversational Bot Service (CBS) includes Question-Answering bot (QABot), task-oriented conversational bot (TaskBot), speech analytics (CBS-SA), and CBS customization, as shown in Figure 8-9.

Figure 8-9 Conversational Bot Question-Answering bot (QABot) helps enterprises quickly build, release, and manage intelligent question-answering bots. Task-oriented conversational bot (TaskBot) accurately understands the intention and key information of a conversation and can be applied to intelligent call services and hardware. Conversational bot service speech analytics (CBS-SA) uses natural language algorithms and user-defined rules to analyze conversations between customer service agents and customers in call center scenarios, helping enterprises improve agent service quality and customer satisfaction. CBS customization helps build versatile AI bots to enable various industries, such as knowledge base and knowledge graph Q&A, task-oriented conversion, reading comprehension, automatic text generation, and multi-modality.

08 Enterprise Smart Application Platform (Textbook)

291

HUAWEI CLOUD Enterprise Smart Application Platform

Page 13

8.1.10.2 Natural Language Processing (NLP) NLP provides services that enable the semantic understanding of robots, including four subservices: Natural Language Processing Fundamentals, language understanding, language generation, and machine translation. Figure 8-10 shows how NLP works.

Figure 8-10 NLP Natural Language Processing Fundamentals (NLP Fundamentals) provide APIs related to natural language, such as word segmentation, named entity recognition, keyword extraction, and short text similarity. You can apply these APIs to various scenarios, such as intelligent Q&A, conversational bots, public opinion analysis, content recommendation, and ecommerce analysis. Language understanding (LU) provides APIs such as sentiment analysis, opinion extraction, text classification, and intent understanding. It can be used in scenarios such as opinion mining, public opinion analysis, intelligent assistants, and conversational bots. Language generation is based on an advanced language model. Information, including text, data, or images, is input to generate readable text expressions. Language generation can be used in human-computer interaction scenarios such as intelligent Q&A and conversations, news summary, and report generation. NLP customization helps build a customized natural language processing model to provide unique competitiveness for enterprise applications. These customized models cover a wide range of fields, such as the automatic classification of legal documents, automatic generation of medical reports, and domain-specific public opinion analysis.

8.1.10.3 Voice Interaction The voice interaction services include automatic speech recognition (ASR), text to speech (TTS), and real-time automatic speech recognition (RASR), as shown in Figure 8-11.

08 Enterprise Smart Application Platform (Textbook)

292

HUAWEI CLOUD Enterprise Smart Application Platform

Page 14

Figure 8-11 Automatic speech ASR applies to the following scenarios: 

Voice search: ASR allows users to easily and efficiently search the web and access GPS navigation or other services with just their voice.



Human-machine interaction: ASR integrates a voice wakeup service that sends voice commands to terminals for real-time operations, improving the interaction between person and machines.

TTS applies to the following scenarios: 

Voice navigation: The vehicle-mounted navigational data is converted into speech materials using the TTS technology, providing accurate voice navigation services. Thanks to the strong customization capability, diversified voice navigation voice services are available.



Audiobooks: Text content, such as books, magazines, and news articles, is converted into human voices, providing a convenient audio file that allows you to obtain the latest news in the car, on the way to work, or at the gym.



Telephone follow-up: The telephone follow-up content is converted into human voices to facilitate the communication with customers and improve user experience.



Smart education: Content on textbooks is converted into life-like voices to simulate classroom teaching, helping students better understand the textbooks.

RASR applies to the following scenarios: 

Audios from a live video stream are converted into audience-friendly subtitles in real time, improving the watching experience and facilitating content analysis.



Audios in a video conference or a conference call are converted into texts in real time, allowing you to check, modify, and retrieve the minutes and improving the conferencing efficiency.



The mobile app records and converts audios into texts in real time, such as voice input, to facilitate subsequent text processing and content archiving. This spares audio recording and greatly improves the conversion efficiency.

8.1.10.4 Video Analytics Video analytics provides services such as video content analysis, editing, and tagging.

08 Enterprise Smart Application Platform (Textbook)

293

HUAWEI CLOUD Enterprise Smart Application Platform

Page 15

Video content analysis applies to the following scenarios: 

Analyzes all videos in a shopping mall or campus in real time to extract key events, such as warehouse and cashier compliance. Detects intrusion, loitering, and abandoned objects in high-security zones. Intelligently prevents property loss using alert deployment and theft detection.



Analyzes pedestrians in a campus in real time to identify and follow-up blacklisted personnel and generate an alarm. Collects statistics on foot traffic at key intersections to help develop better campus management policy.



Analyzes public figures in media videos to accurately identify celebrities, such as political figures and movie stars.



Analyzes video frames, optical flow, and scenarios to recognize actions in videos.

Video content editing applies to the following scenarios: 

Highlight extraction: Extracts video clips to produce video synopsis based on the content relevance and highlights of the video.



News video splitting: Splits a complete news video into news segments of different themes based on the analysis of characters, scenarios, speech, and texts in the news.

Video content tagging applies to the following scenarios: 

Video search: Leverages technologies, such as scenario classification, facial recognition, speech recognition, and text recognition, to classify and tag videos in a hierarchical manner, enabling accurate and efficient video search and improving search experience. Figure 8-12 shows the video search function.



Video recommendation: Leverages technologies, such as scenario classification, facial recognition, speech recognition, and text recognition, to classify and tag videos in a hierarchical manner, enabling personalized video recommendation.

Figure 8-12 Video search

8.1.10.5 Image recognition Image recognition adopts deep learning technologies to accurately identify the visual content in a video. It provides tens of thousands of objects, scenarios, and tags in images, and is capable of target detection and attribute identification, helping customers accurately identify and understand images. The image recognition provides functions such as scenario analysis, smart album, object detection, and image retrieval, as shown in Figure 8-13.

08 Enterprise Smart Application Platform (Textbook)

294

HUAWEI CLOUD Enterprise Smart Application Platform

Page 16

Figure 8-13 Image recognition applications Scenario analysis: The lack of image content tags causes inefficient retrieval. The image tagging function helps accurately identify image content, improve retrieval efficiency and precision, and improve personalized recommendation, content retrieval, and content distribution. Smart album: Tens of thousands of tags identified from images can be categorized in a customized manner. Categories may include plants, food, and work. This feature facilitates album management and improves user experience. Object detection: The customized object detection system reduces safety risks by in real time whether safety helmets are being worn properly at construction sites. Image retrieval: To simplify the image retrieval in a large image library, the tag-based image retrieval technology helps you find the target image by matching the keyword or image you have provided.

8.1.10.6 Content Moderation Content moderation covers texts, images, and videos. Customers adopt cutting-edge text, image, and video detection technologies to automatically detect pornographic content, advertisements, terrorism-related content, and sensitive political information, reducing service non-compliance risks. Figure 8-14 shows the application scenarios of content analysis.

08 Enterprise Smart Application Platform (Textbook)

295

HUAWEI CLOUD Enterprise Smart Application Platform

Page 17

Figure 8-14 Content moderation Content moderation applies to the following scenarios: 

Pornographic content: Identifies and rates a photo with three degrees: pornographic, sexy, and normal.



Terrorism-related content: Quickly detects whether an image contains dangerous content, such as fire, guns, knives, blood, and terrorism flags and signs.



Sensitive political information: Detects whether an image contains sensitive information, such as political figures.



Text content moderation: Identifies pornographic content, sensitive political information, advertisements, insulting words, spam with meaningless or illegible characters, and contraband.



Video content moderation: Determines whether a video has non-compliance risks by detecting non-compliance information in images, sound, and subtitles.

8.1.10.7 ImageSearch ImageSearch leverages deep learning and image recognition technologies to deliver serviceand industry-specific feature vectorization and search capabilities, helping you search for exact/fuzzy match images from a specified image library. Application scenarios of ImageSearch include: (1) Merchandise image search: Images in the merchandise library that are similar to the image taken by the user are searched for to find the same or similar merchandise. This service helps promote merchandise sales or recommendation. Figure 8-15 shows how merchandise image search works. (2) Copyrighted image search: Copyrighted images are important assets of photographic and design websites. With copyrighted image search, you can quickly locate images being used in large gallery websites that infringe on your copyrights, protecting your rights and interests.

Figure 8-15 Merchandise search

8.1.10.8 Optical Character Recognition (OCR) Optical character recognition (OCR) converts characters in images or scanned copies into editable texts. It improves service efficiency by sparing manual information input. OCR can be applied to a broad array of texts, including ID cards, driving licenses, vehicle licenses, invoices, customs documents in English, general tables, and general texts. Figure 8-17 shows how OCR works.

08 Enterprise Smart Application Platform (Textbook)

296

HUAWEI CLOUD Enterprise Smart Application Platform

Page 18

Specifically, OCR can be divided into the following types: general, card, receipt, domain, and custom OCR. General OCR automatically recognizes characters and digits on an image in any format and layout, such as table, document, and online image, quickly converting paper documents into e-documents.

Figure 8-17 Text recognition General OCR applies to the following scenarios: 

Electronic archiving of enterprise documentation: Recognizes text in enterprise documents and reports, and establishes electronic archives for quick search.



Automatic express waybill filling: Recognizes contact information in screenshots and generates express waybills automatically, sparing manual information input.



Efficient contract processing: Automatically recognizes structured information and extracts signatures and seals for quick review.



Electronic customs documentation: The general OCR service automatically converts customs documentation into structured electronic information, improving efficiency and information accuracy.

Card OCR service automatically recognizes and extracts structured data from cards such as ID cards, driving licenses, vehicle licenses, and passports, improving business efficiency. Card OCR applies to the following scenarios: 

Quick authentication: Adopts card identification to quickly complete real-name authentication in scenarios such as mobile phone registration.



Automatic input: Automatically extracts key information from certificates, sparing manual information input and improving efficiency.



Identity verification: Checks whether the user is the certificate holder.

Receipt OCR extracts structured information as editable text from receipts such as VAT invoices, motor vehicle invoices, and medical invoices. It drastically improves business efficiency as manual input is no longer required. Receipt OCR applies to the following scenarios: 

Expense review: Quickly identifies and inputs key information on invoices to streamline reimbursement.

08 Enterprise Smart Application Platform (Textbook)

297

HUAWEI CLOUD Enterprise Smart Application Platform

Page 19



Commercial loans: Rapidly extracts key information on motor vehicle sales invoices and contracts, accelerating vehicle loan handling.



Medical insurance: Automatically recognizes and digitally inputs key information on medical invoices, such as medicine details, age, and gender, and combines with ID card OCR and bank card OCR to quickly handle insurance claims.

Domain OCR extracts structured information from images of logistics waybills and medical forms, facilitating industry automation. Domain OCR applies to the following scenarios: 

Automatic express waybill filling: Recognizes contact information in screenshots and generates express waybills automatically, sparing manual information input.



Medical insurance: Automatically recognizes and digitally inputs key information on medical invoices, such as medicine details, age, and gender, and combines with ID card OCR and bank card OCR to quickly handle insurance claims.

Custom OCR allows you to tailor character recognition to your specific needs. You can customize templates to specify key fields to be recognized in images. 

Certificate recognition: Customizes character recognition and automated information input templates released by other vendors.



Form recognition: Customizes character recognition and automated information input templates released by other vendors.

8.2 ModelArts ModelArts is a one-stop development platform for AI developers. With data pre-processing, semi-automatic data labeling, large-scale distributed training, automatic modeling, and ondemand model deployment on the device, edge, and cloud, ModelArts helps AI developers build models quickly and manage the lifecycle of AI development. The one-stop platform indicates that ModelArts provides one-stop data processing, model development, training, management, and deployment. Technically, the underlying layer of ModelArts supports various heterogeneous computing resources. Developers can flexibly select and use the resources independent of the underlying technologies. In addition, ModelArts supports mainstream open source AI development frameworks, such as TensorFlow and MXNet, and allows developers to use self-developed algorithm frameworks. ModelArts aims to simplify AI development. It provides convenient and easy-to-use processes for AI developers of different levels. For example, service developers can use ExeML to quickly build AI applications without modeling or coding skills. AI beginners can use preset algorithms to build AI applications without model development. AI engineers are provided with multiple development environments, operation processes, and operation modes to facilitate code extension and quickly build models and applications.

8.2.1 ModelArts Functions ModelArts provides developers with one-stop services, including data preparation, algorithm development, model training, deployment, and integration into the production environment. Figure 8-18 shows the functions of ModelArts.

08 Enterprise Smart Application Platform (Textbook)

298

HUAWEI CLOUD Enterprise Smart Application Platform

Page 20

Figure 8-18 ModelArts function overview ModelArts has the following features: 

Data management: ModelArts supports data processing such as filtering and labeling, and provides dataset version management. In particular, the big datasets for deep learning allow reproducing training results.



Rapid and simplified model training: The Huawei-developed MoXing deep learning framework is efficient and easy-to-use, greatly accelerating the training.



Deployment across device-edge-cloud: ModelArts can deploy models in various production environments, such as deployment in cloud for online and batch inference, or at the device and edge sides.



ExeML: ModelArts supports various automatic learning capabilities. It provides training models based on automatic learning for users to complete automatic modeling and oneclick deployment without compiling code.



Visualized workflow: Graph Engine Service (GES) manages the metadata of the development pipeline in a unified manner, and automatically visualizes the evolution of AI development workflows and versions, enabling model tracing.



AI marketplace: ModelArts supports common models and datasets, and internal or public sharing of enterprise models in the marketplace.

8.2.2 ModelArts Architecture and Application ModelArts is a one-stop AI development platform that supports the entire development lifecycle from data management to AI application, including data processing, model training, model management, and model deployment. In addition, the AI marketplace allows developers to share models. Figure 8-19 shows the structure of ModelArts.

08 Enterprise Smart Application Platform (Textbook)

299

HUAWEI CLOUD Enterprise Smart Application Platform

Page 21

Figure 8-19 ModelArts architecture ModelArts applies to the following AI application scenarios: 

Image recognition: ModelArts accurately identifies objects in images, such as animals, brand logos, and vehicle types.



Video analytics: ModelArts analyzes key information in videos and is applicable to the facial recognition and vehicle feature recognition scenarios.



Speech recognition: ModelArts enables machines to understand speech signals and assist in speech processing, making it ideal for intelligent customer service robots and intelligent assistants.



Product recommendation: ModelArts recommends products to customers based on their attributes and behavior characteristics.



Anomaly detection: ModelArts predicts suspicious traffic or faulty devices using an automatic network detection system that analyzes traffic in real time.



In the future, ModelArts will make ceaseless efforts to improve data augmentation, model training, and weakly supervised learning for higher AI model development efficiency.

8.2.3 ModelArts Highlights ModelArts boasts the following highlights: one-stop platform, easy to use, excellent performance, and high flexibility. One-stop platform: The out-of-the-box and full-lifecycle AI development platform provides one-stop data processing, model development, training, management, and deployment. Easy to use: Various built-in open source models and automatic hyperparameter tuning help you start model training from scratch. Models can be deployed on devices, edges, and clouds with just one click. Excellent performance: The Huawei-developed MoXing deep learning framework improves algorithm development efficiency and accelerates training. It optimizes the GPU utilization for online inference, and generates models that can run on Huawei Ascend processors to implement efficient device-edge inference. High flexibility: ModelArts supports multiple mainstream open source frameworks, such as TensorFlow and Apache Spark MLlib, mainstream GPUs, and the Huawei-developed Ascend AI processors. Exclusive use of resources and custom images ensure flexible development experience. Other highlights of ModelArts include: Enterprise-grade: ModelArts supports pre-processing and version management of massive data volumes, model deployment on clouds, edges, and devices, visualized management of the entire AI development lifecycle, and AI service sharing, helping enterprises build internal and external AI ecosystems. Smart-driven: Models for image classification and object detection can be automatically designed and trained as required by deployment environments and inference speed. In addition, ModelArts supports the automatic feature engineering and modeling for structured data. The built-in AI data framework combines automatic pre-labeling with hard example labeling to improve the data preparation efficiency by over 100 folds. The Huawei-developed MoXing high-performance distributed framework harnesses core technologies, such as hybrid parallel cascade, gradient compression, and convolution acceleration, to slash the model training duration. ModelArts deploys models to devices, edges, and clouds with one click.

08 Enterprise Smart Application Platform (Textbook)

300

HUAWEI CLOUD Enterprise Smart Application Platform

Page 22

ModelArts supports AI model deployment for edge, online, and batch inference. ModelArts accelerates AI development using AI technologies, such as automatic learning. It provides wizard-based UI for adaptive training. Full-lifecycle management: ModelArts supports the visualized management of the entire development lifecycle, resumes training at breakpoints, and compares training results. Resource sharing: AI resources can be shared within enterprises for higher efficiency.

8.2.4 How to Access ModelArts HUAWEI CLOUD provides a web-based service management platform. That is, the management console and HTTPS-based application programming interface (API). You can access ModelArts using any of the following methods:

8.2.4.1 Using the Management Console ModelArts provides a simple and easy-to-use management console that supports a wide range of functions, such as ExeML, data management, development environment, model training, model management, service deployment, and AI marketplace. You can complete end-to-end AI development on the management console. To use the ModelArts management console, you need to register with HUAWEI CLOUD first. After registering a HUAWEI CLOUD account, choose EI Enterprise Smart > AI Services > Essential Platform > ModelArts on the homepage to log in to the management console.

8.2.4.2 Calling the SDK If ModelArts needs to be integrated into a third-party system for secondary development, you can call the SDK to complete the development. ModelArts SDK encapsulates the ModelArts RESTful APIs in Python language to simplify user development. For details about the operations and SDK, see ModelArts SDK Reference on the ModelArts official website. In addition, you can directly call ModelArts SDK when writing code in Notebook on the management console.

8.2.4.3 Calling APIs If you need to integrate ModelArts into a third-party system for secondary development, you can access ModelArts using APIs. For details about the operations and APIs, see API Reference.

8.2.5 How to Use ModelArts ModelArts is a one-stop development platform for AI developers. It supports the full-lifecycle management of AI development to help you intelligently create AI models and deploy them to the cloud, edge, and device in one-click mode. ModelArts not only supports ExeML, but also provides multiple pre-trained models and integrates Jupyter Notebook to provide an online code development environment. You can select different ModelArts usage modes based on different user groups. ModelArts provides ExeML for service developers who do not have AI development experience to build AI models from scratch. With ExeML, service developers are freed from model development or parameter adjustment, and can complete an AI development project in just three steps: data labeling, auto training, and service deployment. This section provides an example of how to find Yunbao, the mascot of HUAWEI CLOUD, to help you quickly get started with ExeML of ModelArts. This example is an object detection project. By using the built-in Yunbao image dataset, the system automatically trains and generates a detection model, and deploys the generated model as a real-time service. After the deployment is completed, you can identify whether an input image contains Yunbao using the real-time service.

08 Enterprise Smart Application Platform (Textbook)

301

HUAWEI CLOUD Enterprise Smart Application Platform

Page 23

ModelArts provides built-in algorithms based on mainstream engines for AI beginners with certain AI development capabilities. You can directly use the built-in algorithms to train existing data and quickly deploy the data as a service without model development. The builtin algorithms are applicable to scenarios such as object classification, object location, and image classification. This section provides an example of flower image classification to help you quickly get started with the process of building a model using a built-in algorithm. In this example, you label the existing image data of the built-in flower image dataset, use the built-in algorithm ResNet_v1_50 to train the data to obtain an available model, and deploy the model as a real-time service. After the deployment is completed, you can use the real-time service to identify the flower types contained in an input image. ModelArts provides a one-stop platform that manages the full-lifecycle AI development for AI engineers capable of code writing and debugging. AI engineers can complete the entire AI development process from data preparation to model development, training, and deployment. ModelArts is compatible with mainstream engines and user habits, and provides the Huawei-developed MoXing deep learning framework to improve algorithm development efficiency and accelerate training. This section provides an example of using MXNet and Notebook to develop a handwritten digit image recognition application, helping AI engineers quickly streamline the AI development process of ModelArts. MNIST is a dataset containing handwritten digits, and is often used as an introductory example of deep learning. In this example, the model training script (provided by ModelArts by default) for the MNIST dataset is compiled using the MXNet native APIs. You can complete model training in ModelArts and deploy the model as a real-time service. After the deployment is completed, you can use the real-time service to identify the digits contained in an input image.

8.3 HUAWEI CLOUD EI Solutions This section describes the success cases of HUAWEI CLOUD EI solutions.

8.3.1 Case: OCR Implements Full-Process Automation for Reimbursement Through Invoices OCR can also be used in financial reimbursement scenarios. OCR can automatically extract key information of receipts and automatically fill in reimbursement. The technology combined with robotic process automation (RPA) boosts the reimbursement efficiency. OCR can recognize information on various financial documents, including VAT invoices, taxi invoices, train tickets, itineraries, and shopping receipts. It supports the correction of tilted and distorted images, and text recognition of documents with seals, improving recognition accuracy. In financial reimbursement, one photo usually contains multiple invoices. Generally, common OCR services can identify only one invoice type. For example, the VAT invoice OCR service can identify only one VAT invoice at a time. HUAWEI CLOUD provides Auto Classification OCR, which can identify multiple invoices in one image, cards in one image, and cards and invoices in one image, and support total billing. The Auto Classification OCR supports the image segmentation of documents of various formats, including air tickets, train tickets, medical invoices, driving licenses, bank cards, ID cards, passports, and business licenses. It combines with other OCR services to identify various types of invoices in images. Financial personnel used to manually input the invoice information into the system. With HUAWEI CLOUD OCR, they still need to take photos of each financial invoice and upload

08 Enterprise Smart Application Platform (Textbook)

302

HUAWEI CLOUD Enterprise Smart Application Platform

Page 24

them to the computer or server, which is a time-consuming process. To address this pain point, HUAWEI CLOUD provides the batch OCR solution that allows financial personnel to use only one scanner and one PC to scan invoices in batches to generate color images and automatically call HUAWEI CLOUD OCR services in batches. This solution helps financial personnel quickly extract invoice information and visualize and compare the recognition results intuitively. In addition, the recognition results can be exported to an Excel file or the financial system in batches, greatly simplifying the data recording process. This solution has the following features: •

Multiple access modes: automatic connection to scanners to obtain images in batches; image capture using high-speed document scanners and mobile phones



Flexible deployment: multiple deployment modes, such as public cloud, HCS, and appliance, and unified standard APIs



Support for various invoices: common/special/electronic/ETC VAT invoices, and taxi/train/flight itinerary/quota/toll invoices



One image for multiple invoices: automatic identification and classification of multiple invoice types



Visualized comparison: return of OCR character location information and conversion of such information into an Excel file for statistics collection and analysis

Figure 8-20 shows how the batch OCR solution works. This solution boasts multiple advantages, such as improved efficiency and reduced costs, optimized operation, simplified processes, and enhanced compliance.

Figure 8-20 Invoice-based reimbursement solution

8.3.2 Case: Intelligent Logistics with OCR To pick up a package, the courier can take a photo of the sender's ID card using a mobile app and call the HUAWEI CLOUD OCR service to automatically identify information on the ID card image and complete real-name authentication. Then, to fill in the express delivery information, the courier can upload images, such as screenshots containing address information or chat records. The contact information on the uploaded images, such as the name, phone number, and address, is automatically extracted by the OCR service and input into the system. During shipment, the courier can use the OCR service to extract waybill information for automatic sorting and check whether the express waybill is filled out completely. HUAWEI CLOUD OCR supports information recognition on images at any angles, as well as on unevenly lighted or incomplete images. The solution delivers a high recognition rate and good stability, greatly reducing labor costs and improving user experience. Figure 8-21 shows how the OCR solution enables intelligent logistics.

08 Enterprise Smart Application Platform (Textbook)

303

HUAWEI CLOUD Enterprise Smart Application Platform

Page 25

Figure 8-21 Intelligent logistics solution

8.3.3 CBS A bot with a single function cannot solve all problems in customer service scenarios. A conversational bot solution is developed by integrating multiple bots with different functions. The solution is presented as a single service API. Customers can solve different service problems by calling the single API. The following describes the application scenarios of each bot.

8.3.3.1 Application Scenarios of QABot Frequent consulting and help seeking in IT, e-commerce, finance, and government industries Scenarios with certain knowledge accumulation, QA knowledge base, FAQ or FAQ-like documents, service tickets, and customer service Q&A data

8.3.3.2 Application Scenarios of TaskBot There are clear conversational tasks and conversation process (multiple rounds of interaction) can be flexibly configured based on the real-world scenario. After a conversation template is loaded, the bot can perform multiple rounds of conversations with a customer in specific scenarios while understanding and recording the customer's intentions. 1. Outbound robot: service satisfaction survey, user information verification, recruitment appointment, express delivery notification, product promotion, and high-quality customer selection 2. Customer service: hotel and air ticket booking, credit card activation 3. Smart hardware: voice assistant and smart home, etc.

8.3.3.3 Application Scenarios of Knowledge Graph Q&A Bot (KGBot): •

The KGBot is applicable to the following scenarios:



The knowledge system is complex.



Logical inference is the only way to obtain answers.



Answers can be obtained after multiple rounds of interaction.



For factual issues involving entity attribute values or relationships between entities, the QA pairs cannot be all enumerated.

The KGBot has the following features:

08 Enterprise Smart Application Platform (Textbook)

304

HUAWEI CLOUD Enterprise Smart Application Platform

Page 26

(1) Intelligent integration of multiple robots for better recommendation: Multiple robots draw on their own advantages and self-learning and self-optimization capabilities to recommend the optimal answers to customers. (2) Multi-round intelligent guidance for more accurate understanding: Multiple rounds of conversations and natural interaction help the bot accurately identify users' intentions and understand their potential semantics. (3) Knowledge graph for smarter bots: General domain language model + domain knowledge graph. The graph is dynamically updated and the graph-based bots are smarter. Figure 8-22 shows the architecture of the Conversational Bot service.

Figure 8-22 Conversational Bot architecture The KG-powered QABot can complete precise Q&A. For example, it is capable of querying the price, configuring a specific vehicle model, and recommending a proper model. It can also complete Q&A about vehicle comparison. The answer can contain texts, tables, and images. Figure 8-23 shows a conversational bot with vehicle knowledge.

08 Enterprise Smart Application Platform (Textbook)

305

HUAWEI CLOUD Enterprise Smart Application Platform

Page 27

Figure 8-23 Conversational bot with vehicle knowledge

8.3.4 Case: Intelligent Q&A of Enterprises in a Certain District The intelligent Q&A system of an enterprise in Shenzhen provides automatic service answering for enterprises in the area. The questions that are not answered by the bot will be automatically recorded and pushed to the questioner after they are answered manually. In addition, a complete closed-loop solution is provided for unresolved problems, so that bots can continuously optimize the process and become smarter by recording unresolved problems, outputting knowledge from manually-solved problems, labeling and optimizing models. Figure 8-24 shows the intelligent Q&A system of enterprises. Related services are classified into the following three types: 

Policy consulting (frequent policy changes)



Enterprise-related affairs in the business hall (500+ items)



Requirement issues (various types of requirements)

Figure 8-24 Intelligent Q&A system of enterprises

8.3.5 Case: Gene Knowledge Graph A gene knowledge graph includes multiple types of entities, such as genes, mutations, diseases, and drugs, as well as various relationships between genes and mutations, mutations and diseases, and diseases and medicine. The gene knowledge graph enables the following functions: 

Entity query: Quickly queries information about an entity, such as genes, mutations, diseases, and drugs.



Assisted diagnosis: Deduces possible mutations or diseases based on gene detection information and recommends related drugs.



Gene detection report generation: Generates natural-language gene detection reports based on gene entities and knowledge of associated mutations and diseases. Figure 825 shows how the gene knowledge graph works.

08 Enterprise Smart Application Platform (Textbook)

306

HUAWEI CLOUD Enterprise Smart Application Platform

Page 28

Figure 8-25 Case of gene knowledge graph

8.3.6 Policy Query Based on Knowledge Graphs Governments often issue incentive policies for enterprises, such as tax reduction and rebate. These policies are usually professional and difficult to understand without explanation by professionals. There are various types of policies and rewards, and more than 300 determining criteria. In addition, these determining criteria include logical relationships, such as AND, OR, and NOT. As a result, it is difficult for enterprises to quickly identify applicable policies. To answer this problem, we can build a policy knowledge graph based on the policies, rewards, and determining criteria, and an enterprise knowledge graph. With these two knowledge graphs, once an enterprise name is entered, various enterprise information (determining criteria), such as the type, tax amount, and scale, can be used for logical inference in the policy knowledge graph. In this way, information about policies and rewards applicable to the enterprise can be obtained. Figure 8-26 shows how the policy query based on knowledge graphs works.

Figure 8-26 Policy query based on knowledge graphs

8.3.7 Case: Smart Campus Located in Ban Xue Gang High-Tech Zone, Tian An Cloud Park is a project that focuses on leading industries, such as R&D of next-generation information technologies like cloud computing, mobile Internet, robots, and intelligent devices. The project also develops modern and productive service industries related to these leading industries. Tian An Cloud Park provides open and shared space and establishes smart environments to build a smart ecosystem that fully unlocks the enterprise-talent synergy. This project uses the video analytics solution based on edge-cloud synergy. Video analytics models for face detection, vehicle recognition, and intrusion detection are delivered to the

08 Enterprise Smart Application Platform (Textbook)

307

HUAWEI CLOUD Enterprise Smart Application Platform

Page 29

local GPU inference servers of the campus. After video streams are analyzed locally in real time, the analytics results can be uploaded to the cloud, or saved locally for the use by upper application systems. The video analytics solution based on edge-cloud synergy intelligently analyzes surveillance videos and detects abnormal events in real time, such as intrusions and huge foot traffic, reducing labor costs for campus management. In addition, existing IP cameras (IPCs) in campuses can be reused as smart cameras through edge-cloud synergy, protecting users' existing assets. Figure 8-27 shows how the video analytics solution based on edge-cloud synergy works.

Figure 8-27 Case of smart campus Common HD IPCs are deployed at the device side, and GPU servers are deployed at the edge. The competitiveness and values of edge video analytics are as follows: 

Service benefits: Intelligently analyzes surveillance videos and detects abnormal events in real time, such as intrusions and huge foot traffic, reducing labor costs for campus management.



Edge-cloud synergy: Supports full-lifecycle management and seamless upgrade of edge applications.



Cloud model training: Implements automatic training using algorithms that have good scalability and are easy to update.



High compatibility: Reuses existing IPCs in campuses as smart cameras through edgecloud synergy.

8.3.8 Case: Crowd Statistics and Heat Map The crowd statistics and heat map are used to identify and collect statistics on the crowd in an image, including the number of person and popularity in a region, and support the customized time settings and configurable intervals for sending statistics results. The crowd statistics and heat map are applied to scenarios such as customer traffic statistics, visitor statistics, and business district popularity identification, as shown in Figure 8-28. The crowd statistics and heat map bring the following benefits: 

Strong anti-interference performance: crowd counting in complex scenarios, such as masked faces and partial body blocking



High scalability: concurrent sending of statistics on unauthorized access of pedestrians, region, and heat map



Ease of use: compatible with any 1080p surveillance camera

08 Enterprise Smart Application Platform (Textbook)

308

HUAWEI CLOUD Enterprise Smart Application Platform

Page 30

Figure 8-28 Crowd counting & heat map

8.3.9 Case: Vehicle Recognition As shown in Figure 8-29, vehicle recognition provides the following benefits: Support for various scenarios: Various types of vehicle information, such as models, colors, and license plates, can be identified in different scenarios, such as e-police and checkpoints. Ease of use: Common 1080p surveillance cameras can be used to identify vehicle information in images, including license plates and vehicle attributes. Vehicle recognition supports detection of vehicle types, including sedans and medium-sized vehicles, and recognition of vehicle colors and license plates, including blue and new-energy license plates. This feature is mainly used in scenarios such as campus vehicle management, parking lot vehicle management, and vehicle follow-uping.

08 Enterprise Smart Application Platform (Textbook)

309

HUAWEI CLOUD Enterprise Smart Application Platform

Page 31

Figure 8-29 Case of vehicle recognition

8.3.10 Case: Intrusion Detection Intrusion detection is used to identify unauthorized intrusions in images. It allows extracting moving objects from a camera's field of view and generating an alarm when an object crosses a specified area. In addition, it allows setting the minimum number of person in an alarm area, alarm triggering time, and algorithm detection period. This feature is used to identify unauthorized access to key areas and dangerous areas, and detect climbing, as shown in Figure 8-30. Intrusion detection brings the following benefits: 

High flexibility: settings of the size and type of an alarm object



Low misreporting rate: person/vehicle-based intrusion alarm, without interference from other objects



Ease of use: compatible with any 1080p surveillance camera

08 Enterprise Smart Application Platform (Textbook)

310

HUAWEI CLOUD Enterprise Smart Application Platform

Page 32

Figure 8-30 Case of intrusion detection

8.3.11 Cognitive Computing Platform of China National Petroleum Corporation — Oil and Gas Layer Identification in Well Logging With the construction and improvement of the general information system, China National Petroleum Corporation (CNPC) has accumulated a large amount of structured data and unstructured data. Structured data is well used, but unstructured data is not fully applied. Related knowledge and expertise are not fully explored, and intelligent data analytics and application capabilities are insufficient. The data of the cognitive computing platform features large volumes, various types, and low value density. Cognitive computing is a new computing mode and marks the advanced stage of AI development. It involves a large number of innovative technologies in information analysis, natural language processing, and machine learning, helping decision makers cast insights into massive unstructured data. CNPC harnesses HUAWEI CLOUD knowledge graphs and NLP technologies to build the knowledge graph for the oil and gas industry and develop upper-layer service applications based on the knowledge graph. (The well logging layer is identified as one of the service scenarios. Other scenarios include seismic layer interpretation, water content prediction, and working condition diagnosis.) The solution brings the following benefits: 

Knowledge aggregation: professional knowledge accumulation of the oil and gas industry



Cost reduction and efficiency improvement: simplified service process and shorter work time



Increased reserves and production: increased proven reserves and guaranteed energy security

This solution boasts the following advantages: 

Key activities and data, such as oil well regions, data sources, information extraction, knowledge mapping, and knowledge convergence, can be flexibly modified and manually intervened.



Simple knowledge reuse: New pipeline tasks can be quickly created based on existing ontology and data sources to build graphs.



Flexible modification and one-click validation: Tests can be performed frequently and quickly to improve efficiency. Thanks to the preceding advantages, the time for oil and gas layer identification is shortened by 70%, and the compliance rate is improved by 5%, as shown in Figure 8-31.

08 Enterprise Smart Application Platform (Textbook)

311

HUAWEI CLOUD Enterprise Smart Application Platform

Page 33

Figure 8-31 Cognitive computing platform of China National Petroleum Corporation — Oil and gas layer identification in well logging

8.4 Summary First, this course describes the HUAWEI CLOUD EI ecosystem to help you understand the HUAWEI CLOUD EI services. Second, it focuses on the Huawei EI essential platform ModelArts to help you quickly understand the ModelArts service using experiments. Finally, it presents EI-related cases. Huawei is committed to lowering the threshold for using AI and achieving inclusive AI. To help AI enthusiasts better understand the HUAWEI CLOUD EI application platform, the HUAWEI CLOUD official website sets up the EI Experience Center and EI Training Camp, as shown in Figure 8-32 and Figure 8-33.

Figure 8-32 EI Experience Center

Figure 8-33 EI Training Camp

8.5 Quiz 1.

HUAWEI CLOUD EI is an enabler for enterprise smart. Based on AI and big data technologies, HUAWEI CLOUD EI provides cloud services, such as public cloud and

08 Enterprise Smart Application Platform (Textbook)

312

HUAWEI CLOUD Enterprise Smart Application Platform

Page 34

dedicated cloud, to build an open, trusted, and intelligent platform. Which of the following services are included in the HUAWEI CLOUD EI products and services? 2.

Among the EI products and services, the solution for large scenarios is called EI Intelligent Twins. Which of the following are included in EI Intelligent Twins?

3.

Among the EI products and services, which of the following are included in the EI essential platform?

4.

ModelArts is an essential platform in the EI products and services. It is a one-stop development platform for AI developers. What are the functions of ModelArts?

5.

What are the advantages of ModelArts as a one-stop AI development platform?

08 Enterprise Smart Application Platform (Textbook)

313

Similar documents

Huawei Talent AI HCIA Compiled Textbook

Jonafe Piamonte - 12.5 MB

Artificial Intelligence (AI)

Neshanth Yelumalai - 509.2 KB

B.Com sem 3rd A MST II compiled result

Sukhmander Singh - 90.6 KB

© 2025 VDOCS.RO. Our members: VDOCS.TIPS [GLOBAL] | VDOCS.CZ [CZ] | VDOCS.MX [ES] | VDOCS.PL [PL] | VDOCS.RO [RO]