Intelligent Solutions

New stack part 2 – ZRTP: Encryption done right

Simon Woodhead

Simon Woodhead

3rd March 2017

This is the second in a series of posts (part 1 here) about new features we’re releasing to our voice stack in the very near future. What makes Simwood unique and very distinct from others who buy a magic VoIP box from a vendor is that our service is built on the best of open-source and the best of black boxes where necessary, all pulled together with our secret sauce. It is also deployed (inter)nationally on our own network. By the time “me too” gets a tick box they don’t understand on their magic black box, our customers will be several further innovations further on. Why? Because we exist to put new communications technology in the hands of your customers, and help you enrich their lives. Let’s leave “me too” to compete with yesterday, while we make today the best it can be!

Encryption is an absolute must in today’s world but it is pretty disappointing how little of it is deployed. We think the VoIP industry is doing a disservice to end-users and we suspect the average end-user would be horrified to learn how little their traffic is protected by the average ITSP. Now, you may do far more than your peers and be outraged at this statement but keep this in mind: as a wholesale provider who sees the traffic from hundreds of ITSPs, and more-so the one who has been most vocal on VoIP Fraud and TLS/SRTP as one measure to prevent it, we’d expect more encrypted traffic than our peers and we know how little we get. So horror and outrage aside, we know, and it is not right.

Now one could say that enabling encryption is hard, and we’d agree, certainly where you’re supporting a wide range of user-agents and don’t have auto-provisioning fine tuned for each one. The call leg between us and you is substantially easier, but we know that some of those who do care to enable this find it tricky. So what can we do to help you help your end-users get the privacy they probably expect, and most certainly deserve?

First, some clarification of terminology: As an industry, we’ve become accustomed to talking about ‘SRTP’ when really we mean ‘SRTP, with a key negotiated through SDES’. We’ve done it in the link above but the distinction becomes quite significant as we go down the rabbit hole of encryption. SRTP is just encrypted RTP but how it gets encrypted matters. Colloquially we mean SDES when we say SRTP.


By SDES we mean that session encryption keys are exchanged between user-agents in the body of the SIP message. Of course, anyone who can see the SIP message can see the session encryption keys. That is why SIP messages using SDES are generally always (and certainly should be) sent using TLS. TLS has many other benefits on its own too, not least hiding meta-data about the call.

TLS/SDES is the most widely deployed encryption in VoIP and thus we’ve always supported it. The new stack, being more modern, supports it better and you should find enabling it easier than it was in the past. Please use it!

Sadly though, despite its relative penetration, it is not very secure by design. Because a call passes through multiple proxies and the presence of TLS is not assured between every hop, a proxy may receive the connection over TLS but may not relay it using TLS, thus leaking the keys. Furthermore, the proxy is decrypting on receipt and re-encrypting before transmission so itself poses an eavesdropping opportunity. In short, an end user or enterprise might trust their own SIP server, they might even trust their service provider but what beyond that?

There is a strong argument that intercept is most relevant the closer one is to the intercepted party so beyond their service provider is less relevant and certainly, for us, egressing to the old PSTN there would be no option but to decrypt there. However, as we move to an all VoIP world and end-to-end VoIP calls become more commonplace, one has to consider that both ends of the call become relevant and there are far more opportunities for keys to leak or eavesdropping to be effected.

So in short, TLS/SDES are the best we have in common use. Even “me-too” magic VoIP boxes have a tick box to “do encryption” (if they bought the ‘crypto-pack special offer mega-bundle’) and this usually means TLS/SDES. But TLS/SDES is far far from perfect and end users deserve better. The ideal is end-to-end encryption between the parties to the call, with no eavesdropping opportunities. Enter ZRTP…


ZRTP is distinct from SDES in that the exchange of session keys does not take place in the SIP but rather in the RTP. SRTP is still what results, it is how the key is exchanged that matters, hence our clarification above.

TLS is still advisable between the various user-agents involved in the call for the many benefits it affords, but it is not necessary to achieve encrypted media with ZRTP. Proxies and B2BUAs along the way have scope to break ZRTP set-up by not relaying the appropriate keys, but what we have is opportunistic end-to-end encryption if all they do is do no harm.

ZRTP has other benefits in that the session keys in use to encrypt a call are in part based on the session keys used in previous calls with the same party. Thus a man-in-the-middle intercept becomes very hard unless the intercepting party was there in the middle when the two parties first communicated. Where used truly end-to-end, and for those taking this very seriously, parties to a call can exchange “Short Authentication Strings” audibly; if they do not match it indicates the presence of a man-in-the-middle.

Our new stack supports ZRTP because we believe it offers the best security for your end-users. However, given that we are generally the boundary with the PSTN we will decrypt the ZRTP before relaying over the unencrypted PSTN. On the face of it, we’re acting as a man-in-the-middle here but we figure that encryption for part of the call’s journey is preferable to no encryption.

Of course, calls that are on-net will have the opportunity for end-to-end ZRTP, and further we’ll do no harm in passing your media on where we interconnect with other carriers over IP; thus affording the possibility of end-to-end ZRTP off-net as well. The latter is highly improbable at the present time but the world is changing fast.

So how do you enable this magic encryption? The short answer is: don’t break it! If your end-users have user-agents that support ZRTP, and you either bypass the media directly to us (recommended) or ensure media can pass through your equipment ZRTP will establish where possible, either between your user and our equipment, or your end user and the other party to their call. That is why it is referred to as “opportunistic” encryption and it tends to be more opportune than others because it doesn’t rely on certificates being purchased and added to servers, or installed on phones. It kind of just works.


Whether you opt for ZRTP or SDES or a combination of the two, the new stack has you covered. We’d love to be writing in 6 months about how usage is split between the two, your end-users being protected one way or another. “Me-too” can do unencrypted SIP; Simwood customers are better than that and your customers deserve better than that.

Finally, if getting ZRTP in the hands of your end-users is a challenge, we might just have an app coming for that.

New stack part 2 – ZRTP: Encryption done right
New stack part 3 – Anycast SIP

Related posts