wiki:Tutorial/Graphs

Graphs

A graph in computer science is an ordered pair G =(V,E) of a set of vertices V and a set of edges E. SeqAn provides different types of graphs and the most well-known graph algorithms as well as some specialized alignment graph algorithms. In this part of the tutorial, we demonstrate how to construct a graph in SeqAn and show the usage of some algorithms. Alignment graphs are described in the Alignments part of this tutorial.

Let us follow a simple example. We have given the following network of five cities and in this network we want to compute the shortest path from Hannover to any other city:

source:trunk/seqan/core/demos/tutorial/graph/graph_cities.jpg

In the section Graph Basics, we will create the network and write the graph to a dot-file. The section PropertyMaps assigns city names to the vertices and demonstrates the usage of a vertex iterator. Finally, in Graph Algorithms we will compute the shortest path by calling a single function.

After having worked through these sections you should be familiar with the general usage of graphs in SeqAn. By the way, it is worth looking at the examples section of the documentation. You will find examples for many of the graph algorithms implemented in SeqAn.

Graph Basics

The general header file for all types of graphs is seqan/graph_types.h. It comprises the class Graph and its specializations, all functions for basic graph operations, and different iterators. Later, for computing the shortest path we will also need seqan/graph_algorithms.h which includes the implementations of most of SeqAn's graph algorithms.

#include <iostream>
#include <seqan/graph_types.h>
#include <seqan/graph_algorithms.h>
using namespace seqan;

We want to model the network of cities as an undirected graph and label the edges with distances. Before we start creating edges and vertices, we need some typedefs to specify the graph type.

SeqAn offers different specializations of the class Graph: Undirected Graph, Directed Graph, Tree, Trie, Automaton, Hmm, Oracle, and Alignment Graph. For our example, an undirected graph will be sufficient, so we define our own graph type TGraph with the specialization Undirected Graph of the class Graph. Luckily, this specialization has an optional cargo template argument, which attaches any kind of object to the edges of the graph. This enables us to store the distances between the cities, our edge labels, using the cargo type TCargo defined as unsigned int. Using the cargo argument, we have to provide a distance when adding an edge. And when we remove an edge we also remove the distance.

int main ()
{
    typedef unsigned int TCargo;
    typedef Graph<Undirected<TCargo> > TGraph;
    typedef VertexDescriptor<TGraph>::Type TVertexDescriptor;

Each vertex and each edge in a graph is identified by a so-called descriptor. The type of the descriptors is returned by the metafunction VertexDescriptor. In our example, we define a type TVertexDescriptor by calling VertexDescriptor on our graph type. Analogously, there is the metafunction EdgeDescriptor for edge descriptors.


We can now create the graph g of our type TGraph.

    TGraph g;

For our example, we add five vertices for the five cities, and six edges connecting the cities.

Vertices can be added to g by a call to the function addVertex. The function returns the descriptor of the created vertex. These descriptors are needed to add the edges afterwards.

    TVertexDescriptor vertBerlin = addVertex(g);
    TVertexDescriptor vertHamburg = addVertex(g);
    TVertexDescriptor vertHannover = addVertex(g);
    TVertexDescriptor vertMainz = addVertex(g);
    TVertexDescriptor vertMuenchen = addVertex(g);

The function addEdge adds an edge to the graph. The arguments of this function are the graph to which the edge is added, the vertices that it connects, and the cargo (which is in our case the distance between the two cities).

    addEdge(g, vertBerlin, vertHamburg, 289);
    addEdge(g, vertBerlin, vertHannover, 286);
    addEdge(g, vertBerlin, vertMainz, 573);
    addEdge(g, vertBerlin, vertMuenchen, 586);
    addEdge(g, vertHannover, vertMuenchen, 572);
    addEdge(g, vertHamburg, vertMainz, 521);

Once we have created the graph we may want to have a look at it. SeqAn offers the possibility to write a graph to a dot file. With a tool like  Graphviz you can then visualize the graph.

The only thing that we have to do is to call the function write on a file stream with the tag DotDrawing() and pass over our graph g.

    FILE* strmWrite = fopen("graph.dot", "w");
    write(strmWrite, g, DotDrawing());
    fclose(strmWrite);

After executing this example, there should be a file graph.dot in your directory.

Alternatively, you can use the standard output to print the graph to the screen:

    ::std::cout << g << ::std::endl;

Assignments

Task 1 (Directed Graphs)
Write a program which creates a directed graph with the following edges: (1,0), (0,4), (2,1), (4,1), (5,1), (6,2), (3,2), (2,3), (7,3), (5,4), (6,5), (5,6), (7,6), (7,7)
Use the function addEdges instead of adding each edge separately. Output the graph to the screen.
Difficulty
3
Solution
can be found here

Task 2 (Hidden Markov Models)
Write a program which defines an HMM for DNA sequences:

  1. Define an exon, splice, and intron state.
  2. Consider to use the type LogProb<> from seqan/basic/basic_logvalue.h for the transition probabilities. Sequences always start in the exon state. The probability to stay in an exon or intron state is 0.9. There is exactly one switch from exon to intron. Between the switch from exon to intron state, the HMM generates exactly one letter in the splice state. The sequence ends in the intron state with a probability of 0.1.
  3. Output the HMM to the screen.
  4. Use the follwing emission probabilities:
A C G T
exon state 0.25 0.25 0.25 0.25
splice state 0.05 0.0 0.95 0.0
intron state 0.4 0.1 0.1 0.4
Difficulty
5
Solution
can be found here

Property Maps And Iterators

So far, the vertices in our graph can only be distinguished by their vertex descriptor. We will now see how to associate the city names with the vertices.

SeqAn uses Property Maps to attach auxiliary information to the vertices and edges of a graph. The cargo parameter that we used above associated distances to the edges. In most scenarios you should use an external property map to attach information to a graph. Be aware that the word external is a hint that the information is stored independently of the graph and functions like removeVertex do not affect the property map. Property maps are simply Strings of a property type and are indexed via the already well-known vertex and edge descriptors.

Lets see how we can define a vertex property map for the city names. Our property type is a String of a city name type, a char string. We only have to create and resize this map so that it can hold information on all vertices.

    typedef String<char> TCityName;
    typedef String<TCityName> TProperties;
    TProperties cityNames;
    resizeVertexMap(g, cityNames);


Next, we can enter the city names for each vertex. Note that this is completely independent from our graph object g.

    assignProperty(cityNames, vertBerlin, "Berlin");
    assignProperty(cityNames, vertHamburg, "Hamburg");
    assignProperty(cityNames, vertMuenchen, "Munich");
    assignProperty(cityNames, vertMainz, "Mainz");
    assignProperty(cityNames, vertHannover, "Hannover");

If we now want to output all vertices including their associated information we can iterate through the graph and use the iterators value to access the information in the property map.

But let us first have a quick look at iterators for graph types. SeqAn provides six different specializations for graph iterators: Vertex Iterator, Adjacency Iterator, Dfs Preorder Iterator, and Bfs Iterator for traversing vertices, and Edge Iterator and Out-Edge Iterator for traversing edges. Except for the Vertex Iterator and the Edge Iterator they depend additionally to the graph on a specified edge or vertex.

To output all vertices of our graph in an arbitrary order, we can define an iterator of the specialization Vertex Iterator and determine its type with the metafunction Iterator. The functions atEnd and goNext also work for graph iterators as for all other iterators in SeqAn.

The value of any type of vertex iterator is the vertex descriptor. To print out all city names we have to call the function getProperty on our property map cityNames with the corresponding vertex descriptor that is returned by the value function.

    typedef Iterator<TGraph, VertexIterator>::Type TVertexIterator;
    TVertexIterator itV(g);
    for(;!atEnd(itV);goNext(itV)) {
        ::std::cout << value(itV) << ':' << getProperty(cityNames, value(itV)) << ::std::endl;
    }

The output of this piece of code should look as follows:

0:Berlin
1:Hamburg
2:Hannover
3:Mainz
4:Munich

Assignments

Task 3
Add a vertex map to the program from task 1:
  1. The map shall assign a lower-case letter to each of the seven vertices. Find a way to assign the properties to all vertices at once in a single function call (without using the function assignProperty for each vertex separately).
  2. Show that the graph is not connected by iterating through the graph in depth-first-search ordering. Output the properties of the reached vertices.
Difficulty
4
Solution
can be found here

Graph Algorithms

Now that we completed creating the graph we can address the graph algorithms. Here is an overview of some graph algorithms currently available in SeqAn:

Algorithm SeqAn function
Elementary Graph Algorithms Breadth-First-Search
Depth-First Search
Topological Sort
Strongly-Connected Components
breadthFirstSearch
depthFirstSearch
topologicalSort
stronglyConnectedComponents
Minimum Spanning Tree Prim's Algorithm
Kruskal's Algorithm
primsAlgorithm
kruskalsAlgorithm
Single-Source Shortest Path DAG Shortest Path
Bellman-Ford
Dijkstra
dagShortestPath
bellmanFordAlgorithm
dijkstra
All-Pairs Shortest Path All-Pairs Shortest Path
Floyd Warshall
allPairsShortestPath
floydWarshallAlgorithm
Maximum Flow Ford-Fulkerson fordFulkersonAlgorithm
Transitive Closure Transitive Closure transitiveClosure
Biologicals Needleman-Wunsch
Gotoh
Hirschberg with Gotoh
Smith-Waterman
Multiple Sequence Alignment
UPGMA
Neighbor Joining
globalAlignment
globalAlignment
globalAlignment
localAlignment
globalMsaAlignment
upgmaTree
njTree

The biological algorithms use heavily the alignment graph. Most of them are covered in the Alignments section. All others use the appropriate standard graph. All algorithms require some kind of additional input, e.g., the Dijkstra algorithm requires a distance property map, alignment algorithms sequences and a score type and the network flow algorithm capacities on the edges.

Generally, only a single function call is sufficient to carry out all the calculations of a graph algorithm. In most cases you will have to define containers that store the algorithms results prior to the function call.

In our example, we apply the shortest-path algorithm of Dijkstra. It is implemented in the function dijkstra.

Let's have a look at the input parameters: The first parameter is of course the graph, g. Second, you will have to specify a vertex descriptor. The function will compute the distance from this vertex to all vertices in the graph. The last input parameter is an edge map containing the distances between the vertices.
One may think that the distance map is already contained in the graph. Indeed this is the case for our graph type but it is not in general. The cargo of a graph might as well be a string of characters or any other type. So, we first have to find out how to access our internal edge map. We do not need to copy the information to a new map. Instead we can define an object of the type InternalMap of our type TCargo. It will automatically find the edge labels in the graph when the function property or getProperty is called on it with the corresponding edge descriptor.

The output containers of the shortest-path algorithm are two property maps, predMap and distMap. The predMap is a vertex map that determines a shortest-paths-tree by mapping the predecessor to each vertex. Even though we are not interested in this information, we have to define it and pass it to the function. The distMap indicates the length of the shortest path to each vertex.

    typedef Size<TGraph>::Type TSize;
    InternalMap<TCargo> cargoMap;
    String<TVertexDescriptor> predMap;
    String<TSize> distMap;

Having defined all these property maps, we can then call the function dijkstra:

    dijkstra(g,vertHannover,cargoMap,predMap,distMap);

Finally, we have to output the result. Therefore, we define a second vertex iterator itV2 and access the distances just like the city names with the function property on the corresponding property map.

    TVertexIterator itV2(g);
    while(!atEnd(itV2)) {
        ::std::cout << "Shortest path from " << property(cityNames, vertHannover) << " to " << property(cityNames, value(itV2)) << ": ";
        ::std::cout << property(distMap, value(itV2)) << ::std::endl;
        goNext(itV2);
    }
    return 0;
}

Assignments

Task 4 (Connected Components)
Write a program which calculates the connected components of the graph defined in task 1. Output the component for each vertex.
Difficulty
2
Solution
can be found here
Task 5 (HMM Algorithms)
Extend the program from the task 2. Given the sequence s="CTTCATGTGAAAGCAGACGTAAGTCA"
  1. calculate the Viterbi path of s and output the path as well as the probability of the path and
  2. calculate the probability that the HMM generated s with the forward and backward algorithm.
Difficulty
2
Solution
can be found here

Submit a comment

If you found a mistake, or have suggestions about an improvement of this page press: submit your comment