RSS订阅信息安全技术跟踪与研究:技术、平台、会议、论文、产业
你现在的位置:首页 / 技术积累 / 正文

52知识点之四:复杂性、图灵机、复杂度类P问题

0 技术积累 | 2015年4月6日
转载申明:本站原创,欢迎转载。但转载时请保留原文地址。
原文地址:http://www.vonwei.com/post/4outof52.html

 

52 Things: Number 4: The Complexity Class P

Posted by Ryan Stanley

原文地址:http://bristolcrypto.blogspot.com/2014/10/52-things-number-4-complexity-class-p.html

This is the fourth blog post talking about '52 Things Every PhD Student Should Know' to do Cryptography, and the first on the topic of Theoretical Computer Science. In this post, I've been asked to define the complexity class P. Having recently joined the Cryptography group at Bristol as a Mathematician, I knew very little theoretical computer science when I first started my PhD and I'm sure there will be plenty of others in my situation, so this blog will start right from the beginning and you can skip over parts you know already. First, we'll give an overview of what complexity means and why it matters, then we'll define Turing machines, and finally arrive at the complexity class P, concluding with an example.

 

Most of the content of this post is a reworking of parts of Introduction to the Theory of Computation by Michael Sipser [1], which I have found hugely helpful.

这篇是关于理论计算科学方面的第一个知识点,分三个章节介绍:首先介绍复杂性理论,然后定义图灵机,最后介绍复杂度类P(即P问题)。从这个问题开始,原文博客和翻译补充不再分开,为了方便阅读和指正,一段原文,一段翻译补充。

 

Section 1: Complexity and Big O Notation(复杂度与大O记号)

 

We want to know how difficult a given task is for a computer to do in order to design efficient programs. The trouble is that the processing power of a computer varies massively depending on the hardware (e.g. see last week's '52 Things' blog). So we want a measure of the difficulty of a task that doesn't depend on the specific details of the machine performing the task. One way to do this is to bound the number of operations that a certain model of a computer would take to do it. This is called (time) complexity theory.

为了设计高效的程序,在委派一个任务给计算机时,我们需要评价该任务的困难程度。不过,拥有不同硬件配置的计算机处理能力是不一样的(上周总结了各种计算设备的处理能力差别),因此,需要一种不用考虑计算机的具体细节,并且可以度量计算任务困难性的方法。其中一个方法就是度量特定计算模型下执行操作的数量,也称为时间复杂度理论。当然,还有空间复杂度理论。为了高效执行一个程序,时间和空间上的考虑有时像鱼和熊掌不可兼得,因此需要牛叉的算法程序员来找到更优的解决方案。

Typically, though, the number of operations required will depend on the input to the task and may vary even with inputs of the same length. As a pertinent example, say we design a computer program which tells you whether or not an integer you input is prime. If we give as input the number 256, the program will probably output 'not prime' sooner than if we had given it the number 323 (even though they both have length 9 when written as binary integers, for example), since the first integer has a very small prime factor (2) and the second has larger factors (17 and 19). Therefore we usually opt for a worst-case analysis where we record the longest running time of all inputs of a particular length. So we obtain an algebraic expression t(n) that reflects the longest running time of all inputs of length n.

不过,执行操作的数量还与输入有关,即便相同长度的不同输入,其操作执行的数量也有可能是不同的。这里以“判断一个整数是否为素数为例来说明,如果两个不同的输入分别是256323(这两个整数二进制长度一致),计算机程序给出非素数结果的时间,256明显快于323。原因很简单,256有个非常小的因子2,而323的因子1719稍大。通常,在评价计算复杂度时,我们通常选取最差的情况来分析,即相同长度下所有输入运行时间的最长值。这样,我们得到了一个代数表达式t(n),表示长度为n时所有输入执行的最长时间值。

Furthermore, when the input length n becomes very large, we can neglect all but the most dominant term in the expression and also ignore any constant factors. This is called asymptotic analysis; we assume n is enormous and ask roughly how many steps the model of computation will take to 'finish' when given the worst possible input of length n, writing our answer in the form O(t(n)). For example, if we find that our process takes 6n^3n^2+1 steps, we write that it is O(n^3), since all other terms can be ignored for very large n.

而且,当输入长度n非常大时,我们可以忽略其它项和常数因子,只考虑最主要的项(指数最高的项)即可,这叫做渐近分析,表达形式为大O,即O(t(n))。例如,如果某个程序需要执行6n^3–n^2+1 步,我们可以记为O(n^3),忽略其它低次项和常数值。

 

Section 2: Turing Machines

 

Now we give the model that is most often used in the kind of calculations performed in Section 1. First, recall that an alphabet is a non-empty finite set and a string is a finite sequence of elements (symbols) from an alphabet. A language is simply a set of strings.

这节给出复杂度理论中最常用的模型(图灵机模型)。首先介绍alphabet string language 这三个概念。字母表alphabet 是一个非空有限集合,一个字符串string 是字母表中元素构成的一个有限序列,语言language 是一组字符串的集合。

Turing machine models what real computers can do. Its 'memory' is an infinitely long tape. At any time, each square of the tape is either blank or contains a symbol from some specified alphabet. The machine has a tape head that can move left or right along the tape, one square at a time, and read from and write to that square. At first, the tape is all blank except for the leftmost n squares which constitute the input (none of which can be blank so that it is clear where the input ends). The tape head starts at the leftmost square, reads the first input symbol and then decides what to do next according to a transition function. The transition function depends on what it reads at the square it is currently on and the state that the machine is currently in (like a record of what it has done so far) and returns

1a new state

2another symbol to write to the square it is on (though this symbol might be the same as what was already written there)

3a direction to move in: left or right. 

The machine will continue to move one square at a time, read a symbol, evaluate the transition function, write a symbol and move again, until its state becomes some specified accept state or reject state.

       图灵机可以模拟真正计算机能够做的事情,其内存是一个无限长的纸带。任何时间点,纸带的每个平方或者是空白或者包含来自某个特定字母表的一个符号。机器拥有一个带头,可以沿着纸带左移或者右移,每次只移动一个平方,从那个平方读出和向那个平方写入。最开始,除了纸带最左边的n个平方(构成输入),纸带其它部分全是空白。带头从最左边的平方开始,读取第一个输入符号,然后根据一个转换函数决定下一步做什么。转换函数根据从当前平方读的符号,以及当前机器所处的状态(记录机器至今做了哪些操作),并且返回:(1)一个新状态;(2)另一个符号,供写入当前平方;(3)移动的方向,向左或者向右。机器将会继续一次移动一个平方,读一个符号,评估或者计算转换函数,写入一个符号,一直下去……直到机器状态变成某个特定的接受状态(accept)或者拒绝状态(reject)才停止。

 

If the machine ends up in the accept state, we say it accepts its input. Similarly it may reject its input. In either case we say the machine halts on its input. But note that it may enter a loop without accepting or rejecting i.e. it may never halt. If a Turing machine accepts every string in some language and rejects all other strings, then we say the machine decides that language. We can think of this as the machine testing whether or not the input string is a member of the language. Given a language, if there is a Turing machine that decides it, we say the language is decidable.

如果机器以接受状态终止,说明接受其输入;类似地,机器也可能拒绝其输入。不论哪种情况,说明该机器在该输入下停止(halts on its input)。值得注意的是,机器可能进入一个没有接受或者拒绝的循环,并且永不停止。如果一个图灵机接受某个语言中的所有字符串,并且拒绝所有其他字符串,那么我们称该机器确定(decides)该语言。可以想象,这样机器可以测试一个输入字符串是否属于该语言。给定一个语言,如果存在一个图灵机可以确定该语言,我们称该语言是可判定的(decidable

 

The power of this model comes from the fact that a Turing machine can do everything that a real computer can do (this is called the Church-Turing thesis [2]). We define the time complexity class TIME(t(n)) to be the collection of all languages that are decidable by an O(t(n)) time Turing machine, then we turn computational problems into questions about language membership (is an input string a member of a certain language? e.g. does this string representing an integer belong to the language of strings representing prime integers?) and can partition computational problems into time complexity classes.

图灵机模型强大的原因来源于这个事实,图灵机可以做任何真正计算机能够做的事情,这称为Church-Turing理论[2]。将时间复杂度类TIME(t(n))定义为,可以由一个O(t(n))时间图灵机可判定的所有语言构成的集合。那么,计算问题可以转变为语言成员问题,即一个输入字符串是否是一个特定语言的成员,并且可以将计算问题划分为时间复杂度类。

 

Section 3: The Complexity Class P

 

Finally, we arrive at the purpose of this blog! If t(n)=n^k for some k>0 then O(t(n)) is called polynomial time. The complexity class P is the class of all languages that are decidable in polynomial time by a Turing machine. Since k could be very large, such Turing machines are not necessarily all practical, (let alone 'fast'!), but this class is a rough model for what can be realistically achieved by a computer. Note that the class P is fundamentally different to those languages where t(n) has n in an exponent, such as 2^n, which grow much, much faster as n increases – so fast that even if you have a decider for some language, you may find that the universe ends before it halts on your input!

       最后回归正题。如果对于某个k>0t(n)=n^k,那么O(t(n))称为多项式时间。复杂度类PP问题)是指多项式时间内可以由一个图灵机进行判定的所有语言构成的类。由于k可能会非常大,这样的图灵机不一定都实用,不过这个类是真正计算机所能做事情的一个粗略模型。注意,类P与那些n作为指数的语言在本质上就是不同的,指数增长是非常快的,即便有个图灵机能确定某个语言,你可能会发现即使宇宙都结束了,你的图灵机还没停止。

 

We conclude with an example of a polynomial time problem. Suppose you have a directed graph (a set of nodes and edges where there is at most one edge between any pair of nodes and each edge has an arrow indicating a direction). Then if we encode the graph and the two nodes as a single string, we can form a language consisting of those strings representing a graph and two nodes such that it is possible to follow the edges from the first node and eventually arrive at the second. So a decider for this language will effectively answer the question of whether there is a path from the first node A to the second B, called the path problem, by accepting or rejecting the graph and nodes you input. We give a decider for this language and show that it decides in polynomial time. 

1First put a mark on A. 

2Scan all the edges of the graph. If you find an edge from a marked node to an unmarked node, mark the unmarked node.

3Repeat the above until you mark no new nodes.

4If B is marked, accept. Otherwise, reject. 

This process successively marks the nodes that are reachable from A by a path of length 1, then a path of length 2, and so on. So it is clear that a Turing machine implementing the above is a decider for our language. Now we consider the time complexity of this algorithm. If we couldn't do steps 1 and 4 in polynomial time, our machine would be terrible! So we focus on steps 2 and 3. Step 2 involves searching the input and placing a mark on one square, which is clearly polynomial time in the size of the input. Step 3 repeats step 2 no more times than the number of nodes, which is necessarily less than the size of the input (since the input must encode all the nodes of the graph) and is hence polynomial (in particular, linear) in the size of the input. Therefore the whole algorithm is polynomial time and so we say the path problem is in P.

最后以一个多项式时间问题作为例子进行总结。假定你拥有一个有向图(顶点和边的集合,任何一对顶点之间最多一条边,每条边有个箭头表示方向)。将该图和两个顶点编码为一个字符串,这样的字符串表示一个图和两个顶点,满足是否从一个顶点迎着图的边可以到达另一个顶点,所有这些字符串形成一个语言。该语言的一个图灵机决定子可以回答该问题(是否接受输入的图和顶点),即从顶点A到达顶点B是否存在一条路径,称为路径问题。这里给出了该问题的一个决定子,并证明其是在多项式时间可判定的:(1)给A一个标记;(2)扫描图的所有边,如果找到一条可以从标记顶点达到一个非标记顶点,则把那个非标记顶点也标记上;(3)重复上面的步骤,直到再没有新顶点可以标记为止;(4)如果B被标记了,则接受;否则,拒绝。一个图灵机实现上面的步骤就是该语言的决定子。现在可以考虑一下该算法的时间复杂度,主要考虑步骤(2)和(3)。步骤(2)搜索输入,对于一个平方放置一个标记,显然与输入的规模是呈多项式时间关系的。步骤(3)不断的重复步骤(2),重复的次数不会超过顶点的个数,显然比输入规模要小。因此,整个算法是多项式时间的,路径问题属于P类问题。

 

[1] http://www.amazon.co.uk/Introduction-Theory-Computation-Michael-Sipser/dp/0619217642
[2] http://en.wikipedia.org/wiki/Church%E2%80%93Turing_thesis


  • ------------------分隔线----------------

  • 如果感兴趣,欢迎关注本站微信号,跟踪最新博文信息,手机微信扫一扫下面的二维码,即可关注!
  • 微月信公众号
  • 推荐您阅读更多有关于“ 计算复杂性  图灵机  时间复杂性  P类问题  52知识点   ”的文章

    请填写你的在线分享代码
    上一篇:IEEE S&P 2015会议论文预读(3)下一篇:Win7驱动开发:环境构建与调试

    猜你喜欢

    评论列表:

    发表评论

    必填

    选填

    选填

    必填,不填不让过哦,嘻嘻。

    记住我,下次回复时不用重新输入个人信息

    本站介绍
    最近发表
    本年最热文章
    本月最热文章
    网站分类
    文章归档