Network resilience and redundancy planning

Private PMR public safety networks are designed to be highly resilient, but the migration of critical communications users onto commercial 4G networks increases the complexity of planning for resilience. Richard Martin looks at how it can be achieved.

Public safety networks need to be resilient during man-made problems, from the digging up of a cable to a major natural disaster such as an earthquake. In the latter case, this will be exactly the time the service is most needed. Planning for and building network resilience is vital if the service is to meet its objectives and keep the general public and emergency services safe and effective.

Resilience measures can be implemented in all parts of the radio network, including its primary elements, power supply, links redundancy and switching back-up solutions. Such measures are commonly found in PMR networks, but now ‘best effort’ commercial 4G/LTE networks looking to host mission-critical communications users are having to step up to meet these users’ more exacting requirements.

Base station resilience and backhaul

Kevin Humphries, market specialist – TETRA infrastructure at Motorola Solutions, has been involved in numerous TETRA projects around the world. Starting with power supply, he says: “We have been using a range of solutions for power, either as back-up for mains or as alternatives. Battery back-up is common on conventional sites, but we have provided diesel generators on remote sites where mains is not available.

“In the Middle East we have provided a combined solar and battery combination, and have seen some customers use oil or natural gas generators along pipelines with a feed of fuel from the pipeline itself. I am also aware of a hydroelectric-powered base station in Iceland. In Norway, we have used hydrogen fuel cells. These are an eco-friendly solution, but expensive.”

How much back-up time should be provisioned with batteries? “That depends, it relates to the importance of the site. For example, is it the only one in the area? Also, its accessibility – is it difficult to reach to replace batteries or deploy a temporary generator? In some cases, it may be necessary to provision months or even a year of fuel if the site is very remote, although in the Middle East we have seen a weekly replenishment commonly.”

In terms of security, Humphries adds: “Perimeter security can be enhanced with IR sensors and CCTV cameras, and it is also helpful to have CCTV in the shelter to see who has access to the cabinets. Alarms on doors or fences and gates can alert network managers to unauthorised ingress.”

Vicente Rubiella, product manager – systems, NEBULA TETRA Infrastructure at Teltronic, adds: “Back-up powergeneration types will depend on the customer preferences and terrain characteristics. For example, solar panels can be a good option for isolated areas where grid connection is impractical or involves an unreasonable cost. Backup power-generation equipment must be dimensioned accordingly to provide power throughout the expected time to restore the primary power source in case of failure, usually 24 hours.”

Jochen Boesch, senior director, engineering at Damm Cellular Systems in Denmark, states: “Typically -48Vdc is used to power base stations. This allows for generator, batteries and solar panels in parallel usage. Our base stations also offer 230Vac input to use a UPS.”

Humphries says: “Critical sites should have duplicate links, perhaps from different suppliers. Different technologies can come into play here. If two separate fibre or wire links cannot be provisioned then a microwave or satellite link may be used. Remote sites may only be able to use satellites; good practice would be to connect through two different satellites. 4G gives us another link as a back-up.”

Switch and core

There are different approaches to centralised or distributed switching. Motorola Solutions favours centralised switches. It cites the reduced traffic between the Zone Controller which tracks all base stations, compared with a distributed system where all base stations need to negotiate and pass call information to each other.

This zone controller also knows where every radio and group are located and only sets up calls on base stations where users are present. In terms of resilience, services are maintained if there are local failures. Humphries adds: “Although centralised, our systems provide duplicate processors and there can also be mirrored processors in different locations to provide geographical redundancy. Regional centres such as the counties in the UK can take over the control in another county in the event of a major failure.”

Boesch from Damm makes the case for distributed switching. “Distributed networks can keep a full feature set if a subset of nodes become isolated, and are easily scalable with new sites added without increasing switch capacity. As well as quicker set-up times for local calls, the backbone traffic can match or better a centralised switch if set up intelligently. A master/slave concept for back-up in a decentralised system is easy to handle. When a master fails, the system will automatically make a slave node the master – the same also goes for applications connected to gateways.”

Teltronic advocates centralised switching. Asked whether this is less resilient than distributed, Rubiella states: “Definitely not. Centralised architectures provide all kinds of facilities to feature the highest level of resiliency, when redundancy is vital. Having a centralised switch and a hot standby back-up switch, where no user action is needed to perform an immediate switchover on detection of failure condition, achieves the most resilient network deployments in a cost-effective way.”

Regarding failover mechanisms, Rubiella concludes: “Failover must rely on two basic and essential requirements: rapid detection of failure condition to trigger switchover immediately; and no user action needed, but automatic switchover between main and back-up unit.”

This debate will continue. Potential users would be wise to carefully compare offerings from several suppliers to make the best decision for themselves and their particular needs.

LTE joins the club

The introduction of mission-critical specifications into 3GPP 4G and now 5G means cellular network operators also have to consider higher levels of resilience. With the 3GPP Releases 12-14, public safety features were added to the LTE standard including Proximity Services. This becomes a resilience feature in that users can still be connected when access to a base station is no longer possible. In addition, the QCI (Quality of service Class Identifier) features enable preferential access for critical users when public LTE service is also used. Base stations can also work in isolated mode if all links are down.

Stephane Daeuble, head of marketing, enterprise solutions at Nokia, and Hansen Chan, its product marketing manager, outline some of the resilience features in LTE. Daeuble observes: “The normal resilience level for a private LTE system is 99.9 per cent, but this can be easily increased to four nines or higher, with dual connectivity (using two different frequency layers) to base stations, and reserve power. Also, by using two different frequencies, users are connected to both frequencies at the same time and this will bring availability up to 99.999 per cent. Looking at base station availability, much lower power consumption has made it easier to run on alternative power sources such as batteries or solar panels.”

Nokia has moved to greater levels of silicon integration and increased power amplifier efficiency, meaning that a small cell with RF level equivalent to a macro base station can be run on 90-200W, giving a radius of operation of over 90km with the right antennas and deployment height. Miniaturisation has also enabled the development of portable base stations, which can be deployed in an emergency or network failure.

A helicopter could transport an operator with a rucksack-sized base transceiver station (BTS) to provide coverage over several kilometres radius, making for a completely standalone system for a major emergency. The later addition of a satellite link allows this standalone network to connect to the internet or a wider network to enable communication between the team in the field and people in response centres.

This size reduction also makes it possible to have a base station on a drone, which could be tethered to a vehicle to maintain power nearly indefinitely. Balloon-mounted small-cell BTSs could work for several days typically, versus hours with older generation macro BTS.

Asked whether LTE networks can backhaul themselves, Daeuble responds: “With frequencies being a precious resource, I would say that it is better to use other technologies such as fibre or microwave or unlicensed frequencies for reserve links for base stations.”

Expanding on the switching and core aspects, Daeuble and Chan say: “Regarding the core, our default is two cores or blades, but you can also enhance this with geographical redundancy and even use the core of the public mobile service when possible as the third level of resilience. It’s important to have procedures in place to control when switch back-up is initialised.

“There can be a constant monitoring of the network end-to-end by heartbeat messages through the backhaul network between the base station and the core, then automatic switching if a failure occurs. But there is a case for manual intervention if it is felt that approvals would be needed before switching takes place. Last but not least, the backhaul network also needs to be resilient.”

As regards 5G in public safety, Daeuble and Chan add: “5G with new radio (NR) and core (SA) will offer very high bandwidth for data, increased reliability and low latency features that will come as the 5G standard matures with Release 16-17 and 18. In addition, with slicing capabilities, 5G will also mean that mobile operators can offer a designated resilient service with guaranteed quality of service for public safety use over their public networks.

“But slicing services will also require mobile operators to expand the coverage of their 5G network currently limited to major cities. These kind of capabilities brought by the new 5G standards will come during 2021-2023, and devices that support these features a few years after.”

Boesch from Damm observes: “Do public safety users need to have separate cores? Many vendors do offer IP applications running on smartphones that can get tunnelled and encrypted/authenticated through any kind of IP backbone to connect to the secured centralised or decentralised core.”

Raquel Frisa, product manager, Systems, LTE and Command & Control at Teltronic, takes a different view on separate cores in a 4G network. “The use of a separate network core is essential to guarantee the reliability in broadband networks. But this is not the only requirement when the public agency needs to deliver mission-critical (MC) services. In fact, our recommendation also involves the use of dedicated RAN infrastructure in some critical areas, the use of dedicated spectrum whenever it is possible and the agreement of SLAs in case of hybrid models of network operation (eg, MVNO).”

Frisa continues: “Failover mechanisms between PMR, LTE and Wi-Fi are essential in order to guarantee the availability of the service for the users, and this feature has been already included in the scope of 3GPP standardisation under the paradigm of ‘Interworking’, understood as the continuity of services implemented across broadband and narrowband networks.

“Until this specification is completed and can be extended to other broadband technologies like Wi-Fi, some end-users are demanding proprietary solutions in order to cover their service availability requirements.”

The game of 9s

Whether the goal is 99.9 or 99.999 per cent availability, resilience will not happen without thought and planning. The approach taken by FirstNet (see box) and others in understanding the views and needs of end-users and their organisations will lay down the targets for overall resilience and areas of special need.

Careful analysis will identify the particular areas for developing resilience. Increasingly, networks are partnerships between multiple suppliers, and so establishing closely working teams with clear KPIs and service-level agreements becomes vital. Fall-back strategies which may involve using reserve equipment and personnel also need to be considered, as well as training and regular exercises.