среда, 10 марта 2010 г.

Data structures: Finger Tree (Part 1)

So last time we passed prologue and now reached the main part entitled “Finger Trees”.

Initially this data structure was described in this paper.

Linear data structures are widely used in common-day life of every programmer, .NET BCL contains many of such structures, like List, LinkedList (in fact LinkedList is doubly linked list) etc. What we’ll try to do in this post is to create the structure (based on 2-3 trees) with following characteristics:

  • Immutable (modification returns new instance of structure with changes applied)
  • Enqueue/Dequeue both in start and end in amortized constant time
  • Concatenation support

Quick reminder, scheme of 2-3 tree from previous post

type TwoThree<'T> = 
| Leaf
| Two of 'T * TwoThree<'T> * TwoThree<'T>
| Three of 'T * 'T * TwoThree<'T> * TwoThree<'T> * TwoThree<'T>

This definition is very simple and this is one of its biggest advantages. However this structure cannot enforce abidance of main tree invariant: all leaves should be located on the same level. That's why this rule is put in the code of modification operation and this is the reason of complexity and unreadability of this code. This is the balance: reduce complexity in one place and will be increased in another. But everything is not so bad, we can bring more order in code by modifying the structure of 2-3 tree (all data is removed from interim nodes, the only data containers - leaves)

type Tree<'T> =
| Empty
| Single of 'T
| Multi of Tree<Node<'T>>
and Node<'T> =
| Node2 of 'T * 'T
| Node3 of 'T * 'T * 'T

Notice, that structure of the tree is not regular, first level will be Tree<T>, second - Tree<Node<T>>, third - Tree<Node<Node<T>>> and so on. Operations on such tree usually take log2(N) time, where N - number of elements. However we would like to perform enqueue and dequeue in a constant time...It is time to explain the mistery of the name "Finger tree".

Finger is the structure that provides efficient access to the nodes of the tree near specified location. We'll take end nodes from left and right side and treat them in a different manner - they will behave like a buffer with end elements. The appearance of 2-3 tree with fingers applied is presented on the code below:

type Finger<'T> = 
| One of 'T
| Two of 'T * 'T
| Three of 'T * 'T * 'T
| Four of 'T * 'T * 'T * 'T
type Node<'T> =
| Node2 of 'T * 'T
| Node3 of 'T * 'T * 'T
type FingerTree<'T> =
| Empty
| Single of 'T
| Multi of Finger<'T> * FingerTree<Node<'T>> * Finger<'T>

As we may notice the common view of structure remains the same. For further progress we need to extend types with possibiliy to enumerate themselves, later this feature will be used i.e. for fold operations.

type Finger<'T> = 
| One of 'T
| Two of 'T * 'T
| Three of 'T * 'T * 'T
| Four of 'T * 'T * 'T * 'T

member x.SeqLeft =
seq { match x with
| One(a) -> yield a
| Two(a, b) -> yield a; yield b
| Three(a, b, c) -> yield a; yield b; yield c
| Four(a, b, c, d) -> yield a; yield b; yield c; yield d }
member x.SeqRight =
seq { match x with
| One(a) -> yield a
| Two(b, a) -> yield a; yield b
| Three(c, b, a) -> yield a; yield b; yield c
| Four(d, c, b, a) -> yield a; yield b; yield c; yield d }

type Node<'T> =
| Node2 of 'T * 'T
| Node3 of 'T * 'T * 'T

type FingerTree<'T> =
| Empty
| Single of 'T
| Multi of Finger<'T> * FingerTree<Node<'T>> * Finger<'T>

module Nodes =
let seqLeft t =
seq { match t with
| Node2(a, b) -> yield a; yield b
| Node3(a, b, c) -> yield a; yield b; yield c }

let seqRight t =
seq { match t with
| Node2(b, a) -> yield a; yield b
| Node3(c, b, a) -> yield a; yield b; yield c }

let nodeToFinger n =
match n with
| Node2(a, b) -> Two(a, b)
| Node3(a, b, c) -> Three(a, b, c)

Finger module (code below) contains operations to push and pop values from finger buffers. All operations are trivial except corner case: push to full buffer. Such situation is impossible and controlled by top level push/pop functions, detailed description will be given later in text, for now this should be accepted as an axiom.

let raiseImpossible() = failwith "this should never happen"
let raiseEmpty() = failwith "tree is empty"

module Fingers =
let peekLeft (t : Finger<_>) = t.SeqLeft |> Seq.head
let peekRight (t : Finger<_>) = t.SeqRight |> Seq.head
let pushLeft a = function
| One(b) -> Two(a, b)
| Two(b, c) -> Three(a, b, c)
| Three(b, c, d) -> Four(a, b, c, d)
| _ -> raiseImpossible()
let popLeft = function
| Two(_, a) -> One(a)
| Three(_, a, b) -> Two(a, b)
| Four(_, a, b, c) -> Three(a, b, c)
| _ -> raiseImpossible()
let pushRight a = function
| One(b) -> Two(b, a)
| Two(c, b) -> Three(c, b, a)
| Three(d, c, b) -> Four(d, c, b, a)
| _ -> raiseImpossible()
let popRight = function
| Two(a, _) -> One(a)
| Three(b, a, _) -> Two(b, a)
| Four(c, b, a, _) -> Three(c, b, a)
| _ -> raiseImpossible()
let seqLeft (t : Finger<_>) = t.SeqLeft
let seqRight (t : Finger<_>) = t.SeqRight

To make the picture complete - peek functions for the tree

let peekLeft t = 
match t with
| Empty -> raiseEmpty()
| Single(v) -> v
| Multi(l, _, _) -> Fingers.peekLeft l

let peekRight t =
match t with
| Empty -> raiseEmpty()
| Single(v) -> v
| Multi(_, _, r) -> Fingers.peekRight r

Operation “push-to-tree” is rather trivial, except the situaton when finger already contains four elements, in that case we push three elements into middle tree and leave finger with two items. This means, that Fingers.pushLeft should never be called for Finger.Four.

type private CommonOperations() =

static member pushLeft<'T> (this : FingerTree<'T> ) (a : 'T) : FingerTree<'T> =
match this with
| Empty -> Single(a)
| Single(b) -> Multi(One a, Empty, One b)
| Multi(Four(b, c, d, e), m, r) -> Multi(Two(a, b), CommonOperations.pushLeft m (Node3(c,d,e)), r)
| Multi(l, m, r) -> Multi(Fingers.pushLeft a l, m, r)

Worth noting that pushLeft function is calling itself with different type parameter inside its body. This case is called polymorphic recursion and it cannot be defined in let-bound functions in F#. Instead it can be declared as method of class with full type annotations. This trick is widely used in CommonOperations type. pushRight function is mirror image of pushLeft, so it will be omitted.

Implementation of popLeft function is also non-complex but in that case special situation - one remaining element in finger buffer.

    static member popLeft<'T> (this : FingerTree<'T> )  : FingerTree<'T> =
match this with
| Empty -> raiseEmpty()
| Single(_) -> Empty
| Multi(One(_), Empty, One(v)) -> Single(v)
| Multi(One(_), Empty, r) -> Multi(One(Fingers.peekLeft r), Empty, Fingers.popLeft r)
| Multi(One(_), m, r) -> Multi (Nodes.nodeToFinger (peekLeft m), (CommonOperations.popLeft m), r)
| Multi(l, m, r) -> Multi (Fingers.popLeft l, m, r)

popRight is also the mirror case of popLeft.

Enumerating of tree in different directions.

    static member seqLeft<'T> (t : FingerTree<'T>) : seq<'T> = 
seq { match t with
| Empty -> ()
| Single v -> yield v
| Multi(l, m, r) ->
yield! Fingers.seqLeft l
yield! CommonOperations.seqLeft m |> Seq.collect Nodes.seqLeft
yield! Fingers.seqLeft r }

static member seqRight<'T> (t : FingerTree<'T>) : seq<'T> =
seq { match t with
| Empty -> ()
| Single v -> yield v
| Multi(l, m, r) ->
yield! Fingers.seqRight r
yield! CommonOperations.seqRight m |> Seq.collect Nodes.seqRight
yield! Fingers.seqRight l }

We examined enqueue/dequeue and traversing and now proceed to concatenation. This operation is rather primitive when one of arguments is Empty of Single tree and the only difficult case is concatenation of two Multi trees. Left end of first tree and right end of second tree is passed without changes to result tree and middle part is the result of applying merge function with the following signature (tree1-middle)FingerTree<Node<'T>> -> (tree1-right-end)Finger<'T> -> (tree2-left-end)Finger<'T> -> (tree2-middle)FingerTree<Node<'T>> -> (result)FingerTree<Node<'T>> . It is very easy to define function for transforming two fingers to the list of nodes and use it to create generalized concatenation function.

let flip f x y = f y x

static member flatten a x b = [ yield! Fingers.seqLeft a; yield! x; yield! Fingers.seqLeft b]

static member nodes l =
match l with
| [a; b] -> [Node2(a, b)]
| [a; b; c] -> [Node3(a, b, c)]
| [a; b; c; d] -> [Node2(a, b); Node2(c, d)]
| a::b::c::xs -> Node3(a, b, c)::CommonOperations.nodes(xs)
| _ -> raiseImpossible()

static member concat<'T> (t1 : FingerTree<'T>) (ts : 'T list ) (t2 : FingerTree<'T>) : FingerTree<'T> =
match t1, t2 with
| (Empty, t) -> (ts, t) ||> List.foldBack (flip CommonOperations.pushLeft)
| (t, Empty) -> (t, ts) ||> List.fold CommonOperations.pushRight
| (Single v, t) -> (v::ts, t) ||> List.foldBack (flip CommonOperations.pushLeft)
| (t, Single v) -> (t, v::ts) ||> List.fold (CommonOperations.pushRight)
| Multi(l1, m1, r1), Multi(l2, m2, r2) -> Multi(l1, CommonOperations.concat m1 (CommonOperations.nodes (CommonOperations.flatten r1 ts l2)) m2, r2)

Two finger trees can be concatenated by calling concat passing empty list as ts argument.

In this post we have explored the possibilities of simple finger tree. It can be basically used as an immmutable deque. Next post will illustarte how to enhance current implementation with effective searching of element with particular property.

Source code: SimpleFingerTree.zip

1 комментарий:

 
GeekySpeaky: Submit Your Site!