Path Protection and Restoration Methods

on network Net5

Network Description

This network appears in the dissertation by Yu Liu (advisor: D. Tipper), in which several path protection and restoration methods are explained and compared. It is the largest network given in the document: 18 nodes and 27 spans/links. A single (bi-directional) demand is offered between each pair of nodes, a total of 153 demands. Each demand is assigned a least-cost working path, where each link/span is assigned unit cost. Since this is a one-layer design, we do not distinguish between links and spans.

Having at most one unit of demand between a pair of nodes precludes examining restoration methods that may use multiple backup up. This will be considered in detail for smaller networks.

Solution Techniques for Link Failure, Path Protection, with Link Diversity

Network restoration problems can be characterized by the following attributes:

failure type: (link/span, node)
protection path type: (path, subpath, link)
protection path diversity: (link/span, node)

Here are some attributes to distinguish solution methods:

feasible solution?: (yes, no)
working/protect: (independent, joint)
protection path dependency on fault isolation: (fault-independent, fault-dependent)
protection path selection: (predetermined, dynamic, finite)
spare capacity: (dedicated, shared)

The follow table gives a short description of each method. More complete information can be found in Yu Liu's dissertation. The following key words will be used to distinguish the type of mesh restoration/protection problem:

Solution Method	Attributes	Description
LP	feasible=no, w/p=independent, dependency=fault-independent, selection=dynamic, spare=shared	Linear programming relaxation of shared spare capacity protection for link-failures: for each flow using a single restoration path that is link-and-tandem-node diverse from the working path. The network cost is a lower bound on the optimal spare cost for this type of restoration.
BB	feasible=yes, w/p=independent, dependency=fault-independent, selection=finite, spare=shared	Branch-and-Bound: using a predetermined set of candidate restoration paths for each flow, use branch and bound to determine the minimal network cost. This is an close upper bound on the optimal spare cost, but normally requires too much execution time for large networks.
SA	feasible=yes, w/p=independent, dependency=fault-independent, selection=finite, spare=shared	Simulated Annealing: using a simulated annealing method with a predetermined set of paths
SSR	feasible=yes, w/p=independent, dependency=fault-independent, selection=dynamic, spare=shared	Successive Survivable Routing: Multiple iterations of the SR method
SR	feasible=yes, w/p=independent, dependency=fault-independent, selection=dynamic, spare=shared	Survivable Routing: Calculating the least-cost restoration path for an ordered list of demands, where spare link costs are updated after each demand's restoration path is assigned
SPI	feasible=yes, w/p=independent, dependency=fault-independent, selection=dynamic, spare=shared	Sharing with Partial Information: Similar to SR but using a simplified estimate of spare bandwidth sharing between failures
RAFT	feasible=yes, w/p=independent, dependency=fault-independent, selection=predetermined, spare=shared	Resource Aggregation for Fault Tolerant: Least hop restoration path predetermined for each demand

In the category of 100% recovery from span/link failures, Dr. Liu cites the following results:

LP	BB	SA	SSR Min	SSR Max	SR Min	SR Max	SPI Min	SPI Max	RAFT
234.5	236	237	243	252	245	258	297	324	310

In the category of 100% recovery from node failures, the results are:

LP	BB	SA	SSR Min	SSR Max	SR Min	SR Max	SPI Min	SPI Max	RAFT
_	269.5	_	276	287	278	293	327	355	344

Comparison with St*Mesh

St*Mesh is based on a constraint programming approach to solving a broad range of mesh restoration problems. Although somewhat slower than most heuristic algorithms, the constraint programming approach permitted a more direct modeling of network constraints, such as QoS requirements and spare capacity limitations. It was motivated, in part, to address the "demand order-bias" limitations of greedy heuristic algorithms such as Box-Nathan and SSR, where the order of demands strongly affects the design results. Primarily, though, it attempted to allow express links to be included in the link topology, which greedy heuristics could only accommodate in limited fashion with post-optimizations. Early attempts to use constraint programming on spare capacity design with significant cost modularities produced designs that were cost-effective but not spare-capacity efficient. St*Mesh represents an attempt to retain the advantages of the constraint programming methodology while being more competitive with current techniques on less-constrained spare capacity allocation problems. A design goal for St*Mesh was to be competitive (in solution quality) with the best greedy heuristics for non-expressed topologies with unit link costs.

St*Mesh and CO²Mesh

St*Mesh was begun as the foundation for the author’s PhD research into whether a particular constraint-based AI search technique, which in earlier, more rudimentary formulations had produced cost-effective designs but not bandwidth efficient ones. The network studies by Yu Liu take away all bandwidth modularity, demand bundling, and traffic distribution dimensions that emphasize accurate cost estimation over bandwidth efficiency. Indeed, we found that St*Mesh was slightly more efficient than Liu’s SSR method.

St*Mesh was motivated by the defects in older greedy heuristic methods arising from the “demand-order bias” and their inability to incorporate express links. Lurking underneath is a principle of monotonicity in the narrowing of interval-valued measures of load and cost. Such a principle could not used directly but instead motivated many implementation decisions that the author hopes will be tested using the kinds of hyperparameter optimization techniques employed in ML.

In 2019, after many previous attempts to port the full algorithm to another language, a rewrite of St*Mesh in SWI Prolog was performed to

· simplify the optimization;

· ease future porting to faster implementations, especially languages without adequate support for backtracking;

· support express links and multiyear planning, addressing horizon effects that push planning too close to servicing existing non-expressed topologies;

· support other applications, such as bundling/grooming and scheduling; and

· lay the groundwork for the application of machine learning (specifically, reinforcement learning and probabilistic programming) techniques to determine the values of parameters used to guide demand-altpath assignments.

In the process, the value of many of the subtle refinements in St*Mesh were rediscovered. The motivation and impact of these refinements, together with whether or how they were retained in CO²Mesh, will be discussed.

The allocation of link cost to each demand-failure using the link plays a critical role in the degree of optimality achievable by St*Mesh and CO²Mesh. As the set of demands using a link is not known until all assignments have been made, both implementations estimate the size of this set prior to each assignment.

Comparison of FI Path Restoration Techniques with St*Mesh

St*Mesh is based on a constraint programming approach to solving a broad range of mesh restoration problems. Although somewhat slower than most heuristic algorithms, the constraint programming approach permitted a more direct modeling of network constraints, such as QoS requirements and spare capacity limitations. It was motivated, in part, to address the "demand order-bias" limitations of greedy heuristic algorithms such as Box-Nathan and Successive Survivable Routing (SSR), where the order of demands strongly affects the design results. St*Mesh enables express links to be included in the link topology, which greedy heuristics could only accommodate in limited fashion with post-optimizations. Early attempts to use constraint programming on spare capacity design with significant cost modularities produced designs that were cost-effective but not spare-capacity efficient. St*Mesh represents an attempt to retain the advantages of the constraint programming methodology while being more competitive with current techniques on unconstrained spare capacity allocation problems. A design goal for St*Mesh was to be competitive (in solution quality) with the best greedy heuristics for non-expressed topologies.

St*Mesh produces results between Branch-and-Bound (BB) and SSR for a sufficient number of iterations. For Net5, Simulated Annealing (SA) produces better results than St*Mesh, but this is not always the case. The following table shows the relationship between iteration count and network cost for St*Mesh, as compared to SSR. St*Mesh restricted to a small number of iterations achieves results similar to SSR.

Link Failure

LP	BB	SA	St*Mesh(100)	St*Mesh Min(20)	St*Mesh Max(20)	SSR Min	St*Mesh Min(5)	St*Mesh Max(5)	SSR Max
234.5	236	237	239	240	243	243	243	247	252

St*Mesh results fall above and below those for SSR depending on the number of iterations.

Node failure has a larger impact on the network than link failure, as a node failure will cause all adjacent links to fail. We would expect more spare capacity would be required for network restoration. But 100% recovery is not possible, as demands that originate or terminate from a failed node cannot be recovered. How should we account for these demands? Following Yu Liu's approach, we remove any demands, for a given node failure, that originate or terminate from that node. Since these demands do contribute spare capacity requirements to the link failure case, their removal partially offsets the increase in spare capacity associated with node failure restoration.

Description of SSR and SR

SSR is a greedy heuristic method that has been shown to produce good spare designs for certain types of mesh restoration problems with fast execution. It has been applied to multilayer and dual failure mesh restoration. The author added it to the OPNET SP Guru product while working at OPNET and presumably some of Huawei’s tools while working there.

Greedy heuristics perform sequential backup path assignment for an ordered list of demands, calculating backup routing from current network state. For a given demand, the spare loading from previously-assigned demands is to calculate the least cost backup path for the given demand. SR stops there, updates spare loading, and moves to the next demand in order. SSR recalculates the least cost backup for the given demand using the SR update to spare loading and if this new backup reduces total spare cost further, it iterates again until no improvement is found. Only then does SSR move to the next demand. After all demands are assigned backups, SSR terminates.

Since the order of demands affects the results of both SSR and SR in unpredictably ways, 64 random orderings of demands are produced so that 64 independent trials of SSR (or SR) are executed. The minimum cost of these 64 trials is returned.

Because SSR depends on demand order and it uses a shortest path algorithm (rather than a fixed set of candidate backup paths) to generate the demand backup path dynamically based on the current state of spare loading, it should be expected to produce designs very sensitive to perturbations in demands and network topologies. More critically, there are solutions and technologies that are not easily incorporated in greedy designs, and a number of ad-hoc methods have been proposed to include them.

Comparison with CO2Mesh

One of the questions that motivated CO2Mesh’s initial implementation was whether there was a cost function of the network state that could be applied to each backup in the optimal design so that

· no change of backup path could reduce total spare cost in the final network state

· the same backup path would be chosen in the network state when it was assigned

CO2Mesh addresses the problem that the cost of a backup can best be calculated only after all demands are assigned backups. It attempts to estimate the state of spare loading of a completely-assigned network while making demand backup assignments sequentially, well enough to be confident of the choice of backup path if not the cost of that backup. The estimate of spare loading will use a prior and the backup cost will use the marginal cost of replacing the prior choice with the selected backup plus several hints to influence subsequent backup assignments to produce designs with total spare cost less the predicted one.

These hints may be part of an approximation to the cost function mentioned above. We will see later how MCTS and related methods can be used. We expect RL techniques may be used to tune the weights of these hints to improve this approximation.

Current Comparison of FI Path Restoration Techniques with CO²Mesh

CO²Mesh (version 1) makes several modifications to St*Mesh for multiyear planning and ease of porting to faster languages. The most significant changes are in replacing prior assignments as well as omissions of both post-optimization and sharing estimation. For the latter, a small adjustment to the cost per decision favors assignments that reduce the number of red failure scenarios. Replacement of heuristic assignments has been changed to eliminate backtracking, as this feature in St*Mesh was the most difficult to implement in other programming languages. constraint programming approach to solving a broad range of mesh restoration problems. Although somewhat slower than most heuristic algorithms, the constraint programming approach permitted a more direct modeling of network constraints, such as QoS requirements and spare capacity limitations. It was motivated, in part, to address the "demand order-bias" limitations of greedy heuristic algorithms such as Box-Nathan and Successive Survivable Routing (SSR), where the order of demands strongly affects the design results. St*Mesh enables express links to be included in the link topology, which greedy heuristics could only accommodate in limited fashion with post-optimizations. Early attempts to use constraint programming on spare capacity design with significant cost modularities produced designs that were cost-effective but not spare-capacity efficient. St*Mesh represents an attempt to retain the advantages of the constraint programming methodology while being more competitive with current techniques on unconstrained spare capacity allocation problems.

A design goal for St*Mesh was to be competitive (in solution quality) with the best greedy heuristics for non-expressed topologies. CO²Mesh supports multiyear planning, capacity modularities, but expressed altpaths will need to wait for sharing estimation to be reintroduced.

CO²Mesh iterates multiple times through the list of demands. After an initial prior assignment of decisions, subsequent iterations use as priors the assignments from the previous iteration. Randomization is used to select a subset (of currently 10) of the total candidate altpaths (without maxcover logic), order demands for processing (for later intelligent reordering), add a small term to decision cost to break ties. Expanding to a larger number of candidate altpaths in some post-optimization phase has been completed.

Decisions with lowest incremental change to total cost are selected rather than ones with largest exposure. Exposure is now included with a preference for changing the default as tertiary cost terms. A secondary cost term to encourage the reduction of red FS in links with low red FS count was introduced to play the role of the missing post-optimization. A dual assignment strategy is also being considered as a form of post-optimization.

In its current form, CO²Mesh produces more efficient designs than both St*Mesh and SSR, even with post-optimization and sharing estimation absent. Some of the features in St*Mesh that are missing in CO²Mesh will be evaluated. There are numerous parameters that need to be tuned in this new model which will be addressed by reinforcement learning. The plan is to generate numerous demand distributions for a fixed fiber topology and have CO²Mesh learn parameter values.

Other executions can get stuck in a local minimum with slightly higher cost. Without post-optimization, which invoke many multistep strategies to reduce the spare capacity requirements per link (rather than a demand-altpath approach), CO²Mesh still fewer iterations than St*Mesh.

St*Mesh Applied to Other Mesh Restorations Mechanisms

To be added for link failure problem: 1+1 dedicated path, 1+1 dedicated link, SBPP (link-diversity), SBPP (link and tandem-node diversity), MPLS LSP Recalculation, MPLS FRR (link diversity), and MPLS FRR (link and next node diversity). SBPP stands for Shared Backup Path Protection, where a single backup path is determined for each aggregate node-pair demand. The link-failure results above (for St*Mesh, SSR, SR, and SPI) are solutions to SBPP (link-diversity). The (link and tandem-node diversity) results will be added for comparison, as 1+1 and MPLS LSP calculations find backup paths that are tandem-node disjoint from the working path as well as link-disjoint.

CO²Mesh

A design goal for St*Mesh was to be competitive (in solution quality) with the best greedy heuristics for non-expressed topologies. CO²Mesh does this and also supports multiyear planning, capacity modularities, but expressed altpaths may need to wait for sharing estimation to be reintroduced.

CO²Mesh iterates multiple times through the list of demands. After an initial prior assignment of decisions, subsequent iterations use as priors the assignments from the previous iteration. Randomization is used to select a subset (of currently 10) of the total candidate altpaths (without maxcover logic), order demands for processing (for later intelligent reordering), add a small term to decision cost to break ties. Expanding to a larger number of candidate altpaths in some post-optimization phase has been completed.

Decisions with lowest incremental change to total cost are selected rather than ones with largest exposure. Exposure is now included with a preference for changing the default as tertiary cost terms. A secondary cost term to encourage the reduction of red FS in links with low red FS count was introduced to play the role of the missing post-optimization. Version 2 of CO²Mesh will include post-optimization as an option.

Speed is improved by a factor of 3 or 4 with a port to YAP Prolog, and will be more fully addressed by porting to Picat or Erlang (faster, syntax similar to prolog, and great support for recursion) and then possibly to Python or Julia.

Other Versions and Extensions

The SWI Prolog version supports the probabilistic programming library cplint; the YAP version supports Problog1.

A fully backtrackable version exists to handle highly constrained designs with CHR constraints. Interval constraints can be supported with CLP(BNR).