Topology

Overview

  1. Topology sorting (e.g. Task dependency relationship, topology sorting can decide what order to handle tasks)
  2. Change tracing, trace what is changed by dependency relationship, handle them in right order.
  3. Cycle detection

Story

I am building a finance indicator calculating system. There are a lots of formulas in my system. The system read data and formula from database and calculate them and save result to database.

data -> formula -> calculate -> records

As you see, records depend on formula. Once formula is changed, records have to be updated. But there is a more complex problem. A formula may depends on other formulas like this:

formulaA = formulaB * 10 + formulaC

In other word, once a formula is changed, you have to update all formulas which depend on the changed one. That's why I build this topology pattern. At first, I'd like to write these thing directly in tables, but I think this should be reusable. So I write it in object-oriented way. I consider this as a "Database Design Pattern" or something like that. Everyone can use it, put several pattern together.

Topological Sorting

As what I said, it is designed in object-oriented way. You can get benefit from that. It is very powerful idea to use database in object-oriented way. Here shows you how powerful and easy-to-use it is.

There is two tables in topology pattern:

  1. TraceNode contains nodes of topology
  2. TraceLink contains relationship between two nodes

You can inherit them just like this:

class Task(TraceNode):
    using_options(inheritance='multi')
    # priority of this task
    priority = Field(Float, default=1.0)
    # name of this task
    name = Field(Unicode(32), nullable=False)
    # brief of this task
    brief = Field(Unicode(128))
    # description of this task
    description = Field(Unicode)
            
    def __repr__(self):
        pattern = '<Task %s, priority: %s, changed: %s>'
        return pattern % (self.name, self.priority, self.changed)
        
    def __cmp__(self, other):
        if self.priority < other.priority:
            return -1
        elif self.priority > other.priority:
            return 1
        return TraceNode.__cmp__(self, other)
        
class Dependency(TraceLink):
    using_options(inheritance='multi')

Wow! Amazing! You just build a task dependency sorting system based on database! It is like what name of Elixir means, it is magic isn't it? But that's useless to inherit tables without algorithm. Let's show you how to create a task dependency topology and sort them.

    A
  /  \
 B   C
  \ /
   D
   
B -> A
C -> A (priority 10)
D -> B
D -> C

This is simple task dependency topology. These is code for building them.

c = Task(name=u'C', brief=u'Hi, mom!', priority=-10.0) # this is very important!
a = Task(name=u'A', brief=u'This is a important task!')
d = Task(name=u'D', brief=u'Get every thing done')
b = Task(name=u'B', brief=u'Well..It is not so imporant')

ba = Dependency(fromNode=b, toNode=a)
ca = Dependency(fromNode=c, toNode=a)
db = Dependency(fromNode=d, toNode=b)
dc = Dependency(fromNode=d, toNode=c)

session.commit()

Now you have a graph in database. The more important thing is algorithm. Let's how to sort them.

tasks = Task.query.all()
# here you are! topological-sorted tasks
print '\n'.join(map(repr, sortNodes(tasks)))

Result:

<Task A, priority: 1.0, changed: True>
<Task C, priority: -10.0, changed: True>
<Task B, priority: 1.0, changed: True>
<Task D, priority: 1.0, changed: True>

Oops! Where is my algorithm textbook? Can I sell it? :P

Change tracing

Yes, here comes the main propose of this pattern : Change tracing. As what I mentioned before. What if you have something have to update depends on what it depends? The most stupid way is to update everything. But what about a million rows in database? Or some calculations that cost some days? That's why I build this.

Now let's how to use it.

def handleNode(node):
    print 'handle', node

# Why all task is changed? Because created is considerated as changed
# You have to handle just-created task as well.
print '# 1 change: All task'
tracer = ChangeTracer()
tracer.traceChange(handleNode)

print '# 2 change: A'
# make some change here
a.changed = True
tracer.traceChange(handleNode)

print '# 3 change: B'
# make some change here
b.changed = True
tracer.traceChange(handleNode)

print '# 4 change: C'
# make some change here
c.changed = True
tracer.traceChange(handleNode)

print '# 5 change: D'
# make some change here
d.changed = True
tracer.traceChange(handleNode)

Result:

# 1 change: All task
handle <Task A, priority: 1.0, changed: True>
handle <Task C, priority: -10.0, changed: True>
handle <Task B, priority: 1.0, changed: True>
handle <Task D, priority: 1.0, changed: True>
# 2 change: A
handle <Task A, priority: 1.0, changed: True>
handle <Task C, priority: -10.0, changed: True>
handle <Task B, priority: 1.0, changed: True>
handle <Task D, priority: 1.0, changed: True>
# 3 change: B
handle <Task B, priority: 1.0, changed: True>
handle <Task D, priority: 1.0, changed: True>
# 4 change: C
handle <Task C, priority: -10.0, changed: True>
handle <Task D, priority: 1.0, changed: True>
# 5 change: D
handle <Task D, priority: 1.0, changed: True>

Here you are. A task topological sorting and change tracing system based on database.

Cycle detection

What would happen if you have a dependency cycle?

A -> B -> C -> A -> ...

What is correct order to calculate formula like this? None! There is no correct order, even can't be calculated. What you need is a cycle detection function. You should detect cycle once you modify topology. I can't find a cycle detection algorithm from internet, so I just think one by myself.

The idea of my algorithm is: all nodes of a cycle are linked to other nodes in cycle, and they also be linked by other nodes in cycle. If I have no idea what is a cycle, what about what is not cycle? The algorithm delete nodes that have only in-links or out-links or none-links in every round. Finally, there will be only cycle remained. Because cycle can't be broke in this way. The pseudo-code of algorithm:

  • step 1. Delete all nodes that have no both in and out links.
  • step 2. Compare length of nodes to length of last nodes
  • step 3. Repeat step 1 until two length is same
  • step 4. Check length of remaining nodes, if it is not empty, indicate there is cycle exist, otherwise none.

Let's build some cycle-depended-tasks, you can't never find the first task to do :P

a = TraceNode()
b = TraceNode()
c = TraceNode()
        
ab = TraceLink(fromNode=a, toNode=b)
bc = TraceLink(fromNode=b, toNode=c)
ca = TraceLink(fromNode=c, toNode=a)
print detectCycle([e, f, g])
print detectCycle([e, f])
True
False

Piece of cake, right?

What's more

Do you feel the power of Elixir's object-oriented magic? You can combine algorithm, data structure and database! There is more work to do, I would be happy if someone add more algorithm to this module, fix bugs, or improve performance.

Finally, can you image a STL or Boost for database? For example, you can add a pattern that can store map in database, and a lots of algorithms to find the shortest path between two nodes or something like that. The would be the most powerful weapon to handle complex problem.

Author

Victor Lin ( bornstub@… ) http://victorlin.serveftp.org/programming/

Attachments