Occasional notes: Fun with recursion

"To understand recursion you must understand recursion"

Recursion is an extremly useful tool in the toolbox of every developer. Many problems have recursive nature and thus best solved with recursion. Tree-like stucture is a very nice candidate for demonstration.

type Tree<'T> = Leaf of 'T | Node of Tree<'T> * Tree<'T>

Tree is recursively defined data-type, because it contains values that are Tree itself. A bunch of common operations for this tree (efficiency is dropped for benefit of simplicity).

let rec height = function
    | Leaf _ -> 1
    | Node(l, r) -> 1 + max (height l) (height r) 
let rec sum = function
    | Leaf v -> v
    | Node(l, r) -> sum l + sum r
let rec toList = function
    | Leaf(v) -> [v]
    | Node(l, r) -> (toList l) @ (toList r)

All these operations follows the same pattern: Leaf is the base of recursion and Node case somehow combines results of recursive calls to left and right branches. Extract the pattern:

let rec processTree lF nF = function
    | Leaf(v) -> lF v
    | Node(l, r) -> nF (processTree lF nF l) (processTree lF nF r)

This function is the essence of tree processing order, other functions can be expressed with it.

let height_2 t = processTree (fun _ -> 1) (fun lr rr -> 1 + max lr rr) t
let sum_2 t = processTree id (+) t
let toList_2 t = processTree (fun v -> [v]) (@) t

let tree = 
    Node(
        Node(
            Node(
                Leaf(5),
                Leaf(15)
            ),
            Leaf(10)
        ),
        Node(
            Leaf(20),
            Leaf(25)
        )
    )

printfn "%b" ((height tree) = (height_2 tree))
printfn "%b" ((sum tree) = (sum_2 tree))
printfn "%b" ((toList tree) = (toList_2 tree))
(*
true
true
true
*)

Here we have introduced higher order function that processes given tree in some order and returns the result. Thereby we removed boilerplate recursive code preserving only essential part.

In fact this is a simplified version of generic concept "fold" - well impacted paradigm in functional languages. Folds over lists are the most popular type of folds, but the concept can be spreaded over arbitraty algebraic data type (generalized version of fold is named catamorphism). Concerned persons can find a very nice review of this subject in a set of blog posts by Brian McNamara.

.NET BCL also contains fold for sequences (named Enumerable.Aggregate). We can take this function and, for example, create a generic processor for file system structure.

    public static class SimpleFileSystemWalker
    {
        public static T Walk<T>(string path, T seed, Func<FileInfo, T, T> fileProcessor, Func<DirectoryInfo, T, T> directoryProcessor)
        {
            return Directory.Exists(path)
                       ? Walk(new DirectoryInfo(path), seed, fileProcessor, directoryProcessor)
                       : seed;
        }

        public static T Walk<T>(FileSystemInfo fsi, T seed, Func<FileInfo, T, T> fileProcessor, Func<DirectoryInfo, T, T> directoryProcessor)
        {
            if (fsi is FileInfo)
                return fileProcessor((FileInfo)fsi, seed);
            var di = (DirectoryInfo) fsi;
            return directoryProcessor(
                di,
                di.EnumerateFileSystemInfos().Aggregate(seed, (acc, c) => Walk(c, acc, fileProcessor, directoryProcessor)));
        }
    }

Getting size of directory with this little helper.

            var size = SimpleFileSystemWalker.Walk(@"e:\Lame", 0L, (fi, acc) => acc + fi.Length, (di, acc) => acc);

File processor adds size of file to accumulator and directory processor does nothing - everything is clear and simple. However explicit recursion provides more control over traversing, for example we can skip directories that fits some criteria and current implementation lacks of this feature. We can add it for example by switching order of handlers (invoke directory processor before files) Then directory processor can somehow indicate whether we should go down or stop - for instance by returning bool value. This is possible solution but the code will look untidy: having business with Tuple.Create plus following checks with Item1 or Item2 - all these things makes me sad. More impressive version shall be the follwing - introduce additional parameter next of type delegate T Next(T) in directory processor. This parameter shall behave as continuation, direcory handler can invoke it inside its body to descent into the content of directory.

 public static class FileSystemWalker
    {
        public delegate TAcc FileInfoVisitor<TAcc>(FileInfo fileInfo, TAcc acc);
        public delegate TAcc DirectoryInfoVisitor<TAcc>(DirectoryInfo directoryInfo, TAcc acc, Next<TAcc> next);
        public delegate TAcc Next<TAcc>(TAcc acc);

        public delegate void VoidFileInfoVisitor<TAcc>(FileInfo fileInfo, TAcc acc);
        public delegate void VoidDirectoryInfoVisitor<TAcc>(DirectoryInfo directoryInfo, TAcc acc, VoidNext<TAcc> next);
        public delegate void VoidNext<TAcc>(TAcc acc);

        public static T Walk<T>(string path, T seed, FileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            return Directory.Exists(path)
                       ? Walk(new DirectoryInfo(path), seed, fileVisitor, directoryVisitor)
                       : seed;
        }

        public static void Walk<T>(string path, T seed, VoidFileInfoVisitor<T> fileVisitor, VoidDirectoryInfoVisitor<T> directoryVisitor)
        {
            if (Directory.Exists(path))
                Walk(new DirectoryInfo(path), seed, fileVisitor, directoryVisitor);
        }

        private static T Walk<T>(FileSystemInfo fsi, T seed, FileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            if (fsi is FileInfo)
            {
                return fileVisitor((FileInfo)fsi, seed);
            }
            var di = (DirectoryInfo) fsi;
            return directoryVisitor(
                di,
                seed,
                newSeed => di.EnumerateFileSystemInfos().Aggregate(
                    newSeed,
                    (current, nestedFsi) => Walk(nestedFsi, current, fileVisitor, directoryVisitor)
                    )
                );
        }


        private static void Walk<T>(FileSystemInfo fsi, T seed, VoidFileInfoVisitor<T> fileVisitor, VoidDirectoryInfoVisitor<T> directoryVisitor)
        {
            if (fsi is FileInfo)
            {
                fileVisitor((FileInfo) fsi, seed);
            }
            else
            {
                var di = (DirectoryInfo) fsi;
                directoryVisitor(
                    di,
                    seed,
                    newSeed =>
                        {
                            foreach (var nested in di.EnumerateFileSystemInfos())
                            {
                                Walk(nested, newSeed, fileVisitor, directoryVisitor);
                            }
                        }
                    );
            }
        }
    }

We have added extra overload to Walk method that returns nothing and changes intermediate value only on the directory level. This feature is very handy in methods like Copy

        static void CopyFolder(string path, string destPath)
        {
            FileSystemWalker.Walk(
                path,
                destPath,
                (fi, dir) => fi.CopyTo(Path.Combine(dir, fi.Name)),
                (di, dir, next) =>
                    {
                        var newPath = Path.Combine(dir, di.Name);
                        if (!Directory.Exists(newPath))
                            Directory.CreateDirectory(newPath);

                        next(newPath);
                    }
                );
        }
        static long CalculateSize(string path)
        {
            return FileSystemWalker.Walk(
                path, 
                0L, 
                (fi, acc) => acc + fi.Length, 
                (di, acc, next) => next(acc)
                );
        }

Trampoline

Absence of explicit recursion makes possible modification of internal traverse code without changes in processing logic. For example we can update this code to be more stack-friendly by converting program to trampolined style. The idea behind trampoline is very simple: recursive call are replaced with return of function thunks. These thunks are invoked in loop so amount of stack frames remains the same.

delegate Rec Rec();
    class Program
    {
        static void Print(int i)
        {
            if (i == 0)
                return;
            Console.WriteLine(i);
            Print(i - 1);
        }

        static Rec TPrint(int i, Func<int, Rec> next)
        {
            if (i == 0)
                return null;
            Console.WriteLine(i);
            return next(i - 1);
        }

        static void Main(string[] args)
        {
            // recursive version
            Print(10);

            // using trampoline
            Rec run = null;
            Func<int, Rec> next = null;
            
            next = newV => run = () => TPrint(newV, next);
            next(10);
            while (run != null)
                run = run();
        }
    }

Every iteration returns thunk to next step instead of making direct call. Let's see this step by step

Before method call: stack frame - Main method

First iteration: Stack frame - TPrint

After first iteration: Stack frame - Main method

Second iteration: Stack frame - TPrint, but as we see there are no other calls to TPrint on the stack.

Stack frame jumps up and down like a jumper on the trampoline.

Trampolined version of FileSystemWalker

    public static class TrampolinedFileSystemWalker
    {
        public delegate TAcc FileInfoVisitor<TAcc>(FileInfo fileInfo, TAcc acc);
        public delegate void VoidFileInfoVisitor<TAcc>(FileInfo fileInfo, TAcc acc);
        public delegate void DirectoryInfoVisitor<TAcc>(DirectoryInfo directoryInfo, TAcc acc, Next<TAcc> next);
        public delegate void Next<in TAcc>(TAcc acc);

        public static T Walk<T>(string path, T seed, FileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            return Directory.Exists(path)
                       ? Walk(new DirectoryInfo(path), seed, fileVisitor, directoryVisitor)
                       : seed;
        }

        private static T Walk<T>(FileSystemInfo fsi, T seed, FileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            var value = seed;
            var actions = new Queue<Action>();

            Action<FileSystemInfo> processOne = null;
            processOne = currFsi =>
                             {
                                 if (currFsi is FileInfo)
                                 {
                                     value = fileVisitor((FileInfo) currFsi, value);
                                 }
                                 else
                                 {
                                     var di = (DirectoryInfo) currFsi;
                                     directoryVisitor(
                                         di,
                                         value,
                                         newValue =>
                                             {
                                                 value = newValue;
                                                 foreach (var nested in di.EnumerateFileSystemInfos())
                                                 {
                                                     var n = nested;
                                                     actions.Enqueue(() => processOne(n));
                                                 }
                                             }
                                         );
                                 }
                             };
            actions.Enqueue(() => processOne(fsi));
            while (actions.Any())
                actions.Dequeue()();

            return value;
        }

        public static void Walk<T>(string path, T seed, VoidFileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            if (Directory.Exists(path))
                Walk(new DirectoryInfo(path), seed, fileVisitor, directoryVisitor);
        }

        private static void Walk<T>(FileSystemInfo fsi, T seed, VoidFileInfoVisitor<T> fileVisitor, DirectoryInfoVisitor<T> directoryVisitor)
        {
            var actions = new Queue<Action>();

            Action<FileSystemInfo, T> processOne = null;
            processOne = (currFsi, value) =>
                             {
                                 if (currFsi is FileInfo)
                                 {
                                     fileVisitor((FileInfo)currFsi, value);
                                 }
                                 else
                                 {
                                     var di = (DirectoryInfo)currFsi;
                                     directoryVisitor(
                                         di,
                                         value,
                                         newValue =>
                                             {
                                                 foreach (var nested in di.EnumerateFileSystemInfos())
                                                 {
                                                     var n = nested;
                                                     actions.Enqueue(() => processOne(n, newValue));
                                                 }
                                             }
                                         );
                                 }
                             };
            actions.Enqueue(() => processOne(fsi, seed));
            while (actions.Any())
                actions.Dequeue()();
        }
    }

It is a bit more complex than previous sample, every call can lead to many recursive calls, so we replace single delegate with a queue of actions. Each step can put as many actions as it need and all calls shall be made one after another using fixed amount of stack frames.

static void CopyFolder(string path, string destPath)
        {
            FileSystemWalker.Walk(
                path,
                destPath,
                (fi, dir) => fi.CopyTo(Path.Combine(dir, fi.Name)),
                (di, dir, next) =>
                    {
                        var newPath = Path.Combine(dir, di.Name);
                        if (!Directory.Exists(newPath))
                            Directory.CreateDirectory(newPath);

                        next(newPath);
                    }
                );
        }

        static void TCopyFolder(string path, string destPath)
        {
            TrampolinedFileSystemWalker.Walk(
                path,
                destPath,
                (fi, dir) => fi.CopyTo(Path.Combine(dir, fi.Name)),
                (di, dir, next) =>
                {
                    var newPath = Path.Combine(dir, di.Name);
                    if (!Directory.Exists(newPath))
                        Directory.CreateDirectory(newPath);

                    next(newPath);
                }
                );
        }

        static long CalculateSize(string path)
        {
            return FileSystemWalker.Walk(
                path, 
                0L, 
                (fi, acc) => acc + fi.Length, 
                (di, acc, next) => next(acc)
                );
        }
        static long TCalculateSize(string path)
        {
            return TrampolinedFileSystemWalker.Walk(
                path,
                0L,
                (fi, acc) => acc + fi.Length,
                (di, acc, next) => next(acc)
                );
        }

Client code was isolated from details of traverse so it was unaffected by fundamental changes we made.

Occasional notes

пятница, 26 марта 2010 г.

Fun with recursion

Комментариев нет:

Отправить комментарий

Обо мне

FeedBurner Feed Count

StackOverflow Profile

Архив блога

Ярлыки