人工智能06对抗搜索课件

文档格式：PPT| 61 页|大小 3.40MB|2024-12-11 发布|举报 | 版权申诉

下载文档

下载文档到电脑，查找使用更方便还剩页未读，继续阅读>>

侵权申诉举报

1 / 61

此文档下载收益归作者所有下载文档

版权提示

文本预览

常见问题

单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,,*,,,,,,,单击此处编辑母版标题样式,*,,*,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,人工智能06对抗搜索,人工智能06对抗搜索人工智能06对抗搜索Game Playing博弈博弈被被认为是AI研究领域中的一个很好的难题:– 博弈是不平凡的 • 玩家需要具备“human-like”般的智能 • 游戏可能非常复杂(e.g., Chess, Go) • 需要在有限时间内作出决策– games usually are: • 定义明确的且可重复的 • 完全可观察的环境是有限的– 能直接比较humans and computers,人工智能06对抗搜索人工智能06对抗搜索人工智能06对抗搜索,1,Game Playing博弈,博弈被被认为是,AI,研究领域中的一个很好的难题,:,–,博弈是不平凡的,•,玩家需要具备,“human-like”,般的智能,•,游戏可能非常复杂,(e.g., Chess, Go) •,需要在有限时间内作出决策,– games usually are: •,定义明确的且可重复的,•,完全可观察的环境是有限的,–,能直接比较,humans and computers,Game Playing博弈博弈被被认为是AI研究领域中的一,2,Computers Playing Chess,Computers Playing Chess,3,对战中的AI,对战中的AI,4,Computers Playing Go,Computers Playing Go,5,本章大纲,博弈,博弈中的优化决策 — 极小极大值算法 — α-β 剪枝,资源限制和近似评估,包含几率因素的游戏,不完整信息的游戏,本章大纲博弈,6,Games vs. search problems,,不可预测的对手,→,解决方案是针对每一个可能的对手回复的策略,,时间是有限的,→ 不太可能找到最优解，需找到近似解,,游戏对于低效率有严厉的惩罚,,,进攻计划,:• Computer considers possible lines of play (Babbage, 1846)• Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)• Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948;Shannon, 1950)•,First chess program,(Turing, 1951)•,Machine learning,to improve evaluation accuracy (Samuel, 1952-57)•,Pruning,（剪枝）,to allow deeper search (McCarthy, 1956),Games vs. search problems 不可预测,7,游戏的种类,确定性的随机的、策略的,游戏的种类确定性的,8,博弈树(2-player, 确定性的, 轮流出招),博弈树(2-player, 确定性的, 轮流出招),9,确定性的Two-Player,E.g. 井字棋, 国际象棋, 跳棋,博弈搜索– 状态-空间搜索树– 玩家轮流出招–每一层包含一轮行动–选择能获得最佳效用的行动,零和游戏– 一个玩家要最大化效用值– 而另一个要最小化效用值,确定性的Two-PlayerE.g. 井字棋, 国际象棋,,10,极小极大值原理,假设两位玩家都按照最佳策略行动,–computer,假设在其行动以后，其对手会选择效用值最小的状态来移动,–computer,在选择其最佳行动方案时要同时考虑自己以及对手的最佳行动,从max的角度显示效用值,极小极大值原理假设两位玩家都按照最佳策略行动 –comp,11,Minimax,确定性的，完全信息的博弈的最优策略,Idea: choose move to position with,highest minimax value,= best achievable payoff against best play,在对手,也使用最优策略,的条件下，能导致至少不比其它策略差的结果,,,假设两个游戏者都按照最优策略进行，那么节点的极小极大值就是对应状态的效用值（对于,MAX）• MAX,优先选择有极大值的状态,• MIN,优先选择有极小值的状态,Minimax 确定性的，完全信息的博弈的最优策略,12,Minimax,确定性的，完全信息的博弈的最优策略,Idea: choose move to position with,highest minimax value,= best achievable payoff against best play,在对手,也使用最优策略,的条件下，能导致至少不比其它策略差的结果,,E.g., 2-ply game:,Minimax 确定性的，完全信息的博弈的最优策略,13,Minimax algorithm,Minimax algorithm,14,Minimax 的性能,完整性？？,Minimax 的性能完整性？？,15,Minimax 的性能,完整性？？,,仅当博弈树是有限时,(chess has specific rules for this).,但在一颗无限的博弈树中也存在有限的策略,~,最优性？？,Minimax 的性能完整性？？仅当博弈树是有限时(che,16,Minimax 的性能,完整性？？,,Yes,,仅当博弈树是有限时,,最优性？？,Yes,,遇到一个聪明的对手。

Otherwise??,,时间复杂度？？,,Minimax 的性能完整性？？ Yes,仅当博弈树是有限时,17,Minimax 的性能,完整性？？,,Yes,,仅当博弈树是有限时,,最优性？？,Yes,,遇到一个聪明的对手Otherwise??,,时间复杂度？？,O(b,m,),,空间复杂度？？,,Minimax 的性能完整性？？ Yes,仅当博弈树是有限时,18,Minimax 的性能,完整性？？,,Yes,,仅当博弈树是有限时,,最优性？？,Yes,,遇到一个聪明的对手Otherwise??,,时间复杂度？？,O(b,m,),,空间复杂度？？,,O(bm),(,深度优先搜索,),For chess, b≈35, m≈100 for “reasonable" games,→,寻找精确解时完全不可行的,But do we need to explore every path?,Minimax 的性能完整性？？ Yes,仅当博弈树是有限时,19,α - β 剪枝,•,若与一个聪明的对手对弈，则博弈树上的一些分枝绝不会发生,,• “If you have an idea that is surely bad, don’t take the time to see how truly awful it is.” -- Pat Winston,,•,剪枝能消除搜索树的很大一部分分枝,α - β 剪枝• 若与一个聪明的对手对弈，则博弈树上的一,20,α−β pruning example,α−β pruning example,21,α−β pruning example,α−β pruning example,22,α−β pruning example,α−β pruning example,23,α−β pruning example,α−β pruning example,24,α−β pruning example,α−β pruning example,25,为什么叫α−β,•,,α,is the best value (to MAX) found so far on the current path,到目前为止在路径上的任意选择点发现的,MAX,的最佳（即最大值）选择,• If,v,is worse than,α,, MAX will avoid it, so can stop considering,v’s,other children,→,prune that branch,• Define,β,,similarly for MIN,为什么叫α−β• α is the best value (,26,The α−β algorithm,The α−β algorithm,27,α−β,剪枝技术,对于一个,MAX,节点来说，它取值称为,α,值,对于一个,MIN,节点来说，它取值称为,β,值,,β,剪枝：任何,MAX,节点,x,的,α,值如果不能降低其父节点的,β,值，则对节点,x,以下的分枝可停止搜索。

α,剪枝：任何,MIN,节点,x,的,β,值如果不能升高其父节点的,α,值，则对节点,x,以下的分枝可停止搜索α−β剪枝技术对于一个MAX节点来说，它取值称为α值,28,α−β,剪枝案例,α−β剪枝案例,29,α−β搜索的效率,•,效率很大程度上取决于检查后继的顺序,;,所以尝试检查可能较好的后继是值得的,•,最坏情况,:–,没有分枝需要修剪,–,相比于穷举搜索没有改进,•,最好情况,:–,每个玩家的最佳行动都先被检查,•,在实践中，性能接近于最好情况，而不是最坏情况，但要依实际情况而定,α−β搜索的效率• 效率很大程度上取决于检查后继的顺序; 所,30,α−β的性能,剪枝,不影响,最终结果,,好的行动顺序能提高剪枝的效率,,With “perfect ordering," time complexity =,O(b,d/2,),doubles solvable depth,,不幸的是,,,35,50,,也是有可能的,α−β的性能剪枝不影响最终结果,31,本章大纲,博弈,博弈中的优化决策,—,极小极大值算法,—,α-β,剪枝,资源限制和近似评估,包含几率因素的游戏,不完整信息的游戏,本章大纲博弈,32,Resource limits资源限制,标准方法,:,深度有限搜索,Use CUTOFF-TEST (,截断测试,) instead of TERMINAL-TEST,（终止测试）,e.g., depth limit (perhaps add quiescence search,静态搜索,)Use EVAL instead of UTILITY,用可以估计棋局效用值的启发式评价函数,EVAL,取代效用函数,i.e.,,估计位置期望值的评价函数,,假设我们有,100,seconds,计算时间,,,探索速度为,10,4,,nodes/second,→,10,6,nodes per move ≈,35,8/2,,→,α−β,,reaches depth 8,→,pretty good chess program,,4-ply lookahead is a hopeless chess player!– 4-ply ≈ human novice– 8-ply ≈ typical PC, human master– 12-ply ≈ Deep Blue, Kasparov,Resource limits资源限制标准方法: 深度有限搜,33,评价函数,•,评价非终止状态的函数,,•,理想函数,:,返回每个位置的效用值,•,在实践中,:,加权线性函数,:,Eval(s) = w,1,f,1,(s) + w,2,f,2,(s) +,…,+ w,n,f,n,(s),e.g., for chess, w,1,= 9 withf,1,(s)= (number of white queens) - (number of black queens), etc.,,评价函数• 评价非终止状态的函数• 理想函数: 返回每个,34,More on 评价函数,•,评价函数评估当前局面配置的好坏,,•,一个线性的评价函数是关于特征,f,1,, f,2,, f,3,的加权和,– More important features get more weight,,•,对弈的质量直接依赖于评价函数的质量,,•,为了构建一个好的评价函数，必须,:–,利用行业知识提取好的特征,–,选择或学习更好的权重,More on 评价函数• 评价函数评估当前局面配置的好坏,35,题外话: 精确的评价函数并不重要,Behavior is preserved under any monotonic,（单调的）,transformation of EVAL,题外话: 精确的评价函数并不重要Behavior is pr,36,对待有限的时间,•,在实际游戏中，通常对每一步有时间限制,T,•,我们如何考虑这个问题,?–,所以，我们可以设置一个保守的深度有限，以保证在,T,时间内决定一次行动,–,但是，搜索可能提前结束，更多搜索的机会被浪费了,对待有限的时间• 在实际游戏中，通常对每一步有时间限制T,37,对待有限的时间,•,在实践中,,,迭代深入深度优先搜索,(IDS),被很好地使用,–,运行,alpha-beta search,以深度限制逐渐增加的方式,–,当时间,T,快运行完时，返回最后一次完整的,α−β,搜索的结果,(i.e., the deepest search that was completed),对待有限的时间• 在实践中, 迭代深入深度优先搜索(IDS),38,现今一些确定性的游戏,Chess(,国际象棋）,: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply,（层、厚度）,.–,计算机能够预见它的决策中的长期棋局序列。

机器拒绝走一步有决定性短期优势的棋,—,显示了非常类似于人类的对危险的感觉——Kasparov– Kasparov lost the match 2 wins to 3 wins and 1 tie– Deep Blue played by “brute force” (i.e., raw power from computer speed and memory); it used relatively little that is similar to human intuition and cleverness–,Used minimax, alpha-beta, sophisticated heuristics,现今一些确定性的游戏Chess(国际象棋） : Deep B,39,现今一些确定性的游戏,Checkers,（西洋跳棋）,:,Chinook,, the World Man-Machine Checkers Champion.• Chinook ended,40-year-reign,of human world champion Marion Tinsley in 1994.• In 2007, checkers was solved: perfect play leads to a draw.,Chinook cannot ever lose,,使用了一个提前计算好的存有,443,748,401,247,个不多于,8,个棋子的棋局数据库，使它的残局,(endgame),走棋没有缺陷,50 machines working in parallel on the problem,现今一些确定性的游戏Checkers（西洋跳棋） : Chi,40,现今一些确定性的游戏,黑白棋,:,人类冠军已经拒绝同计算机比赛了,~,,Go,（围棋）,: 2016,年以前，人类冠军拒绝与计算机比赛，因为计算机是个小学生棋手。

In go, b > 300,（棋盘为,19x19,）,,,所以大多数程序使用基于模式识别的方法来提供一个貌似可行的解Go has became a new benchmark for Artificial Intelligence (,人工智能新的试金石,),,现今一些确定性的游戏黑白棋: 人类冠军已经拒绝同计算机比赛了,41,AlphaGo: 第一次打败人类in 19x19 Go,• Google DeepMind computer go player– deep neural networks,深度神经网络,: • value networks,价值网络,: to evaluate board positions • policy networks,策略网络,: to select moves– trained by • supervised learning,监督学习,• reinforcement learning,（强化学习）,by self-play– search algorithm • Monte-Carlo simulation + value/policy networks,AlphaGo: 第一次打败人类in 19x19 Go• G,42,AlphaGo: Background,•,减少搜索空间,:–,减少搜索深度,• position evaluation–,减少搜索分枝,• move sampling based on policy • policy = probability distribution p(a|s),AlphaGo: Background• 减少搜索空间:–,43,Deep Neural Networks in AlphaGo,AlphaGo uses two types of neural networks:– policy network: what is the next move? • learned from human expert moves– value network: what is the value of a state? • learned from self-play using a policy network,SL = supervised learning, RL = reinforcement learning,Deep Neural Networks in AlphaG,44,包含几率因素的游戏,西洋双陆棋,包含几率因素的游戏西洋双陆棋,45,非确定性游戏概述,在非确定性的游戏中,,,几率因素是由掷骰子，抽牌等引起的。

抛硬币游戏的简化示例,:,非确定性游戏概述在非确定性的游戏中, 几率因素是由掷骰子，抽,46,非确定性游戏概述,•,权重取决于事件发生的概率,,•,将极小极大值推广为,期望,极小极大值,,• Choose move with highestexpected value,非确定性游戏概述• 权重取决于事件发生的概率,47,期望效用最大化,•,为什么我们要计算平均效用值,?,为什么不计算极小极大值,?,,•,期望效用最大化原则,:,一个智能体基于其给定的知识库，会根据,期望效用最大化,来选择行动方式,,•,决策的一般性原则,•,经常作为理性的定义,•,我们会在本课程中反复看到该观点,!,期望效用最大化• 为什么我们要计算平均效用值? 为什么不计算,48,期望极小极大值算法,EXPECTIMINIMAX,类似于,MINIMAX,，多考虑一个几率节点,,,if state is a Max node thenreturn the highest EXPECTIMINIMAX-VALUE of SUCCESSORS(state),if state is a Min node thenreturn the lowest EXPECTIMINIMAX-VALUE of SUCCESSORS(state),if state is a chance node thenreturn average of EXPECTIMINIMAX-VALUE of SUCCESSORS(state),期望极小极大值算法EXPECTIMINIMAX 类似于MIN,49,随机的Two-Player,•,掷骰子增加分枝,b:,两个骰子有,21,种可能的掷法,–,西洋双陆棋,≈ 20,种合法行动,– Depth 4 = 20 x (21 x 20),3,=1.2 x 10,9,•,当深度增加时，到达指定节点的概率会收窄,–,此时再向前搜索的价值会减少,–,所以限定搜索深度是,OK,的,–,但是剪枝不太可能实现,…,• TDGammon uses depth-2 search + verygood eval function + reinforcementlearning: world-champion level play,随机的Two-Player• 掷骰子增加分枝b: 两个骰子有,50,题外话：精确的评价函数的重要性,Behaviour is preserved only by positive linear transformation of EVAL,Hence EVAL should be proportional to the expected payoff,评价函数应该是棋局的期望效用值的正线性变换,题外话：精确的评价函数的重要性Behaviour is pr,51,本章大纲,博弈,博弈中的优化决策,—,极小极大值算法,—,α-β,剪枝,资源限制和近似评估,包含几率因素的游戏,不完整信息的游戏,本章大纲博弈,52,不完整信息的游戏,E.g., card games, where opponent's initial cards are unknown,Typically we can calculate a probability for each possible deal,Seems just like having one big dice roll at the beginning of the,Game,,Idea: compute the minimax value of each action in each deal,then choose the action with,highest expected,value over all,Deals,,在评价一个有未知牌的给定行动过程时，首先计算出每副可能,牌的出牌行动的极小极大值，然后再用每副牌的概率来计算得到,对所有发牌情况的期望值。

不完整信息的游戏E.g., card games, wher,53,Example,Example,54,Example,Example,55,Example,Example,56,合理的分析,*所以从直觉上说用所有可能状态的平均效用值来评价一次行动的价值,is WRONG,,在局部观察中,,,一个行动的价值取决于,信度状态,,这样可以在信度状态中生成和搜索博弈树,,以下行为可以帮助生成信度状态：,,打出一张牌来刺探对手,,给合伙人发信号,,靠演技来最小化信息披露,合理的分析*所以从直觉上说用所有可能状态的平均效用值来评价一,57,Summary,• Games are fun to work on!– perfection is unattainable  must approximate– Games are to AI as grand prix racing is to automobile design,,• Game playing is best modeled as a search problem– Search trees for games represent alternate computer/opponent moves,• Evaluation functions estimate the quality of a given board configuration for each player,,•,Minimax,is an algorithm that chooses “optimal” moves by assuming that the opponent always chooses their best move,•,Alpha-beta,is an algorithm that can avoid large parts of the search tree, thus enabling the search to go deeper —,消除无关的子树以提高效率,Summary• Games are fun to work,58,Summary of Search,• Uninformed search strategiesBreadth-first search (BFS), Uniform cost search, Depth-first search (DFS),Depth-limited search, Iterative deepening search,,• Informed search strategies– Best-first search: greedy, A*– Local search: hill climbing, simulated annealing etc.,,• Constraint satisfaction problems– Backtracking = depth-first search with one variable assigned per node– Enhanced with: Variable ordering and value selection heuristics, forwardchecking, constraint propagation,Summary of Search• Uninformed,59,作业,6.1， 6.3， 6.5,作业6.1， 6.3， 6.5,60,谢谢！,谢谢！,61,。

点击阅读更多内容