The graph coloring problem is one of the most fundamental and central problems in discrete mathematics and optimization. The problem is not only of theoretical interest, but is also encountered in practice. For instance, certain scheduling problems can be modeled as an instance of graph coloring. Other applications are found in telecommunications, and compiler optimization techniques. Informally, in this problem, we wish to color the vertices of a graph with the minimum number of colors such that no two adjacent vertices receive the same color.
Formally, we are given an undirected (simple) graph and an integer . The task is to decide whether there is a vertex coloring such that , for each . The problem is known to be NP-complete, i.e., computationally difficult. Thus, it is highly unlikely there exists an algorithm that is always fast (runs in polynomial time) for the problem. Below is a graph (left), and an optimal 3-coloring of it (right).
Many problems we want to solve are in fact computationally difficult, i.e., we have no fast algorithm for them. Yet we still want to solve the problems: maybe an answer means saving (or making) a lot of money, saving human lives, or improving our quality of life. So we will not accept putting our hands up and simply walking away from the problem. Indeed, what can be done?
In a sense, the “canonical” NP-complete problem, i.e. “the mother of all NP-complete problems”, is the satisfiability problem (SAT). In this problem, we are given a formula in propositional logic (meaning each variable has a domain of size 2, corresponding to values “true” and “false”). The task is to decide whether there is an assignment of truth values to the variables of the formula such that it evaluates to true. The problem has a long history in both mathematics and computer science.
The obvious algorithm for solving SAT is of course the following. Suppose the instance of SAT has variables. Then, there are possible assignments (equivalently, entries in a truth table) of values true and false to the formula. Try each assignment, and output YES if and only if at least one evaluates to true. Otherwise, output NO. Interestingly, it was discovered at least already in the early 60s, that one can do much better algorithmically. This fueled the interest of coming up with better heuristics, data structures, and algorithms, and eventually faster and faster SAT solvers. Currently, the research and experimental work on SAT solving is a core topic of artificial intelligence, and theoretical computer science in general.
A popular approach when wanting to solve an NP-complete problem in practice is to model it as an instance of SAT, and feeding it to a SAT solver. In this assignment, we will explore the usage of SAT solver as a means of tackling a hard, practical problem.
In this assignment, we will explore the colorability of a set of specific graphs. For this, we will use a SAT solver. Roughly, the task can be broken down into the following subtasks:
In the following, let us take a closer look at the specified subtasks.
The graphs are given in the so-called DIMACS format. A graph file in the DIMACS format has three types of lines, detailed below. The following is an example:
c This is an example graph.
c It will have 3 vertices and 3 edges.
p graph 3 3
e 1 2
e 1 3
e 2 3
Let us detail the three types of lines:
Comment lines. A comment line starts with ‘c’, and can be skipped by a parser. Comment lines may appear anywhere in the file.
Problem line. There is exactly one problem line per file, and it is the first line to appear (after possible comments).
p format vertices edges
The problem line starts with ‘p’. The string format can be safely skipped. The first integer vertices gives the number of vertices in the graph. Similarly, edges gives the number of edges in the graph.
Based on this, it should be clear the file above is representing the complete graph on three vertices (“a triangle”), i.e., . Note: Usually, (and for this assignment), it is safe to assume the vertex set is defined on the set of continuous integers . In particular, observe vertices start from “1”, not “0”.
It can be shown that any propositional formula can be expressed in conjunctive normal form (CNF). In this form, the formula is represented as a conjunction of clauses, where each clause is a disjunction of variables. For instance, the formula is in CNF form. Here, we have two clauses of size two both. There are three variables , , and , some of which appear negated. For example, a truth table will reveal is a satisfying assignment to the formula (i.e., it evaluates to true), while say is not (i.e., it evaluates to false).
CNF formulas are represented in so-called DIMACS CNF format. In this format, the above formula would be written as
c This is again a comment line.
c A CNF formula on 3 variables and 2 clauses to follow.
c The formula is (x_1 OR !x_2) AND (x_1 OR !x_3)
p cnf 3 2
1 -2 0
1 -3 0
Comment lines behave exactly as in the graph format. The problem line is similar: the first integer gives the number of variables, the second integer the number of clauses. The two remaining lines both list a clause. A positive integer means the clause contains the corresponding variable. If it’s negated, the clause contains a negation of that variable. The symbol “0” always ends a clause line.
Given a graph and an integer , our task is to write a CNF formula that will be true if and only if the graph is -colorable. In general, there are several possibilities for encoding a problem as an instance of SAT. Indeed, this will be the case here as well.
A straightforward encoding is as follows. Suppose has vertices. Then, the CNF formula will have variables, i.e., exactly variables for each vertex . The intuition is that if the th variable of is set to true, can receive the color . Then, we can also enforce each vertex to receive exactly one color by essentially expressing the XOR of the variables in CNF. Finally, for each edge, we must enforce that the endpoints of the edge receive distinct colors. For example, if and is an edge, we would have two clauses and , expressing the fact that not both and can receive the color 1, nor the color 2 simultaneously.
Every year, the so-called SAT competition for SAT solvers is held. In the competition, typically roughly a hundred SAT solvers are solving a predetermined set of challenging benchmark instances. To get a rough estimate or answer to the question “what solver should I use?”, it is a good idea to look at the highly ranked solvers from recent competitions.
You are free (and encouraged) to experiment with different solvers. They are typically easy to run, especially from a Unix environment. Some solvers we recommend:
Your program(s) should read a graph (in the above mentioned format), and output a CNF formula (in the above mentioned format). You should then feed the CNF formula to some SAT solver. We are particularly interested in the following graphs:
We ask the following questions:
What is the smallest for which graph A is -colorable?
What is the smallest for which graph B is -colorable?
What is the smallest for which graph C is -colorable?
Fix to be some small value, say . Is there a noticeable difference in the runtime of a SAT solver when deciding for -colorability of graphs A and C, or B and C? That is, for whatever reason, does it seem that coloring graph C is clearly harder or easier than that of A or B? How could this be explained?
A bonus question for extra challenge:
For the bonus question, you are free to use whatever solver, algorithm, heuristic, or external program to solve the question. The question is intended to be doable, but not (too) easy. Explain the approach you took for solving the question. For whichever answer you arrived at (YES/NO), how did you ensure the correctness of your answer?