SIP: FAQ

SIP Functionality

SIP Protocol Operation

Relationship to Other Protocols

Does SIP support the standard telephone features?
Yes. SIP supports, among others:

Some services, like repetitive dialing, station speed dialing, last number redial, and distinctive ringing, are implemented purely in the end system and require no support from the signaling protocol.

The Telecommunications Industry Association (TIA) is working on a recommendation for business PBX-style services and other Internet phone requirements.

How does SIP support caller ID?
Caller-ID is provided by the From SIP header containing the caller's name and "number". The number would most likely be placed in the user field of a SIP URL or appear in a tel: URL.

Since the callee generally does not know or trust the callee's server, only cryptographic signatures can be used to ensure that the information is valid. For example, the outgoing proxy might be operated by an ISP, enterprise or phone company and sign for the identity of the caller, using the signedby parameter, with the identity of the company verified by a public key certificate similar to those used by web sites.

Should SIP be used to join a conference from a web page?
It is possible to embed a SIP URL in a web page, including a session description. Clicking on that link triggers an invitation for the conference listed to the address contained in the URL. Unfortunately, the current standard browsers (Netscape and Internet Explorer) make it difficult or impossible to add support for another URL type.

Until SIP URIs are implemented in standard browsers, data: URLs can be used to implement similar functionality, albeit less elegantly.

If it is desired that following the link directly adds the user to an existing conference, e.g., for a conference "TV guide"-style directory, the data: URL is more appropriate.

Can a SIP-initiated session have zero or one participants?
SIP-initiated sessions can have no or just one participant. Examples of a session with no participants include an invitation to a multicast group with no members (beyond the invited party). Also, SDP sessions can start at a future time relative to the invitation.

How do I charge/bill for Internet telephony using SIP?
This depends on whether you plan to charge for SIP services like directory look-ups, call processing or mobility, for gateway services to the PSTN, or for carrying media data:
SIP services
The Authorization header can be used to indicate a customer identity that associates a SIP request with a billable entity.

Examples of possibly chargeable SIP services include:

  • Directory services such as SIP proxy/redirect lookups;
  • Customer profile management;
SIP server operations can be charged based on server logs or, for real-time billing, via AAA.
Media services
Media services include retrieving and storing voice mail, as well as transcoding of media streams. They are not initiated by SIP, but, for example, via RTSP.
Gateway services
Similar to SIP services. Care has to be taken to stop billing when (say) RTP voice data is no longer flowing through the gateway. The gateway will generate call detail records (CDRs) either directly or through RADIUS.
Transport (network services)
It seems unlikely that voice calls carried over a best-effort service will generate per-minute charges. When reserving bandwidth or guaranteeing other quality-of-service parameters, the resource reservation protocol or differentiated services are the appropriate mechanism for including charging. These reservation protocols will likely be used in applications that are not initiated by SIP, for example, audio/video on demand or VPNs. Actual accounting records may be generated by AAA protocols (e.g., by policy enforcement points (PEP) or policy decision points (PDP)) or log files.

Under some circumstances, a SIP proxy server may be useful to initiate such reservations or differentiated services treatment on behalf of a call, since it may be easier to authenticate the SIP request than the lower-layer reservation request or the end system may not be capable of making reservations or marking packets. In those cases, the SIP proxy would initiate a resource reservation and "charge back" the caller identified by the SIP request.

Dean Willis wrote with regards to billing for SIP services:

Why can't service providers make a living providing (at a fixed cost) access to "free services"? Do carriers do per HTTP-transfer billing now? How much should they charge for an email? For a call, what parameters might be used? Bandwidth, duration, distance -- the Big Factors of the POTS bill -- are not issues that SIP is concerned with.

How do prepaid calling cards work in a SIP network?
Note that, in general, prepaid calling cards only make sense in an IP network if there is a special-purpose VoIP internet, calls traverse a IP-to-PSTN gateway or VoIP packets receive special treatment. The SIP requests are forced to traverse a stateful proxy, which controls the Internet telephony gateway, router QOS function or firewall, depending on the architecture. When the time is used up, the proxy or gateway issues a BYE request to both parties, using the existing call ID. It also disables the gateway connection, turns of any special QOS treatment for the RTP packets or closes the firewall for that stream. This requires no additions to either caller or callee. Relying on SIP BYE itself only suffices the end systems can be trusted by the network provider not to keep sending packets.

Does SIP carry DTMF?
There are at least two options for carrying DTMF and similar signals in a VoIP network using SIP. First, DTMF can be transported as an RTP payload (RFC 2833). This has the advantage that it provides accurate timing and alignment with the speech RTP packets. Also, media gateways are the most likely to detect and generate tones, so that making it part of the media stream is appropriate. However, under some circumstances, it may be necessary for signaling entities to know about DTMF signals. Currently, there is no standardized solution within SIP, but it has been proposed to carry DTMF information in SIP INFO messages, either encoded as simple text or using the RFC 2833 format. The latter is more complex, but offers duration and timing information.

What does the [H14.17] in RFC 2453 stand for?
This is explained in Section 3 of RFC 2543. It refers to the section number in the HTTP/1.1 specification.

Do callers need to know the location of the Location Server?
The caller doesn't interact with the location server directly. A redirect or proxy server asks the location server (which may be co-resident with the SIP server or not) for "advice". The location server is just a logical abstraction to indicate where the SIP server gets its information from. The protocol between SIP server and location server is beyond the scope of SIP. Examples of location servers include

Also, callers don't register with the location server.

Which parts of SIP are case-sensitive and which are case-insensitive?
Method CS
Header field name CI
Hide CI
Accept-Encoding CI
Accept CI
Accept-Language CI
Encoding name (PCMU, L16, etc.) CI
rfc1123-date CS

What is the difference between a call leg and a call id?
A call leg refers to the one-to-one signaling relationship between two user agents (UAs). The Call-ID is an identifier, carried in the SIP messages, that refers to the call. A call is a collection of call legs. A UAC starts by sending an INVITE; because of forking, it may receive multiple 200 OKs from different UAs. Each corresponds to a different call leg within the same call. Call is thus a grouping of call legs. In the call control spec, additional call legs are created through the Also header.

Call legs refer to end-to-end connections between user agents, rather than any relationship with proxies. Within a call leg, there are numerous transactions in both directions.

The request URI is not used in call leg identification.

The To and From field relate to local and remote in the following way. When Alice sends a request on a call leg to Bob, the From field contains the local address (Alice), and the To field the remote address (Bob). When a request is received by Bob, the To field is matched to Bob's local address, and the From field to the remote address (Alice).

The CSeq spaces in the two directions of a call leg are independent. Within a single direction, the sequence number is incremented for each transaction.

What is the difference between tag and branch-id?
Branch IDs allow proxies to match responses to forked requests. Without them, a proxy wouldn't be able to tell which branch a response corresponds to. Tags, in To headers, are of no help here since they are not known until responses arrive. Tags are used by the UAC to distinguish multiple final responses from different UAS.

A UAS has no reliable way of determining if the request has been forked or not. Thus, to be safe it needs to add a tag. Proxies only insert tags into the final responses they generate themselves; they never insert tags into requests or responses they forward.

Since a request can be forked several times on its way to UAS, a single "tag" (or whatever you like to call it) added to the request by one of the proxies is not sufficient for the next forking proxy along the chain to match responses on its own branches; every proxy that forked the request would need to add its own unique IDs to the branches it created. This is precisely what's being achieved by the branch parameter in the Via header. (Igor Slepchin)

How can one recognize a retransmitted, duplicate or looped request?
header retransmitted duplicate matching response
From same same same
To same same same, but tag may have been added
Call-ID same same same
request URI same same same
CSeq same same same
Via - - must be local host; check for branch parameter to identify which branch

Looped request are recognized by one or more of the following:

What is the relationship between the From, Contact, Via and Record-Route/Route headers?
All these headers determine how requests and responses are routed in a network of SIP proxy servers. Roughly, the distinction is:
From:
Used for subsequent requests if there is no Contact or Record-Route header. E.g., if Alice makes a call with From: Alice <alice@example.org> to Bob, an INVITE request from Bob to Alice would use alice@example.org as the To header and Request-URI.
Contact:
Determines the destination placed in the Request-URI for subsequent requests and can be used to bypass proxies not enumerated in a Record-Route header. Also used in responses by redirect servers and in REGISTER requests and responses.
Record-Route/Route:
The Record-Route header is inserted into requests by proxies that want to be in the path of subsequent requests for the same call-id. It is then used by the user agent to route subsequent requests. The mechanism is similar to a source-route, copying the Record-Route information into a set of Route headers. The Request-URI is set to the first Route header.
Via:
Via headers are inserted by servers into requests to detect loops and to allow responses to find their way back to the client. They have no influence on the routing of future requests (or responses).

Generally, in short, requests should be sent to Route if present, Contact if there is no Route, From if there is no Contact.

How are URLs compared?
Two SIP URLs are compared for equality according to the following rules:

Does SIP do admission control?
Since this offers no real security (calls could always bypass a servr), admission control is not supported by SIP. If an "outbound proxy" is used for outgoing calls, that proxy may control the firewall and thus restrict outgoing calls.

Does SIP administer bandwidth?
No, that is the role of a resource reservation protocol. There is no reason to assume that any Internet telephony signaling server (such as a proxy) would know the available bandwidth in real networks. Having such a central server would not scale. Administering bandwidth separately for each application is also likely to be difficult and inefficient.

There is a proposal for an SDP extension that allows SIP INVITE requests and responses to indicate that resource reservation must succeed before the callee is alerted.

What's the difference between the request URIs tel:+12125551212 and sip:12125551212@gw.com?
Non-SIP URLs, such as tel:+12125551212 for a telephone number, may be used as request URIs in SIP INVITE requests. This only makes sense if all outbound calls are handled by a proxy server. In the case of a tel: URL, the proxy server would then translate the request URL to a SIP URL of a gateway server, if it is not handling the gateway duty itself. The proxy server might use the Gateway Location Protocol (GLP) to find the appropriate next-hop SIP server. The To header may always be a tel: URL even if the Request-URI is a SIP URL, although that breaks with the common practice that Request-URI and To start out the same.

Do I always need a proxy or redirect server?
No, two SIP servers can contact each other directly.

How does a caller find its local registrar?
The local registrar is either manually configured or, more likely, the SIP client issues a multicast registration request to the sip.mcast.net standard multicast address, which all registrars listen to.

Is the domain of the request-URI and the To header always the same?
The Request-URI names the destination of the registration request, i.e., the domain of the registrar. The user name must be empty. Generally, the domains in the Request-URI and the To header field have the same value; however, it is possible to register as a "visitor", while maintaining one's name. For example, a traveler sip:alice@acme.com (To) might register under the Request-URI sip:atlanta.hiayh.org, with the former as the To header field and the latter as the Request-URI. Note, however, that requests for a user at acme.com are not likely to arrive at the atlanta.hiayah.org server; special purpose routing logic will generally need to be established in order for requests for alice@acme.com to go to the atlanta.hiayh.org server. In the vast majority of cases, the domains in the request URI and To field will match. The REGISTER request is no longer forwarded once it has reached the server whose authoritative domain is the one listed in the Request-URI.

How do I ensure registrar reliability?
There are several techniques that can be used to minimize the impact of registrar/proxy server failures for a server in a local area network:

For servers separated from their client by a wide-area network, use of multicast is not appropriate, so that these servers have to rely on traditional backup techniques to achieve reliability. For example, the designated registrar could multicast registration updates within its local network to keep standby servers synchronized.

Are ACK requests retransmitted?
No. An ACK is sent when a response retransmission is received. Reliability is achieved because the response is retransmitted until an ACK arrives, and the ACK is retransmitted on response retransmissions. ACK is only used for INVITE.

How are BYE requests routed?
Since a Contact header MUST be present in INVITE and 200, the BYE will go directly to the user agent if there is no Record-Route header. If there is a Record-Route, it will traverse the list of proxies indicated there.

If the caller decides to send a BYE before receiving a 200 from the callee, the BYE is be handled by the proxies just as the corresponding INVITE was handled, i.e., it may be forked.

Can I CANCEL requests other than the first INVITE?
Yes, any request can be cancelled before it has been executed by the UAS. However, it is likely that this will only make sense in practice for the initial INVITE and subsequent "re"INVITE. In the latter case, the call remains, just any changes requests are cancelled.

How does a caller find its proxy server?
Calls typically proceed directly to the callee's domain. For example, when calling alice@example.com, the INVITE request would be sent to the SIP server for the domain example.com, found via DNS.

If a "local" (outbound) proxy is needed for outgoing calls, it currently needs to be manually configured, similar to the configuration of web proxies in browsers. Extensions to (for example) use a REGISTER response or DHCP are under discussion.

What's the difference between a stateless and a stateful proxy server?
Stateless proxies forget about the SIP request once it has been forwarded. Stateful proxies remember the request after it has been forwarded, so they can associate the response with some internal state. In other words, stateful proxies maintain transaction state. Stateful implies transaction state, not call state.

Stateless proxies scale very well, and can be very fast. They are good for network cores. Stateful proxies can do more (they can fork, for example, see the next question) and can provide services stateless ones can't (call forward busy, for example). They don't scale as much as stateless ones. An admininstrator gets to decide which to use. These are also logical entities; a physical proxy is likely to act as a stateless proxy for some calls, stateful for others, and as a redirect server for even others.

Neither stateful nor stateless proxies need to maintain call state, although they can, but will need to make sure that they are part of subsequent transactions via the Record-Route header.

Proxies must be stateful if one of the following conditions hold:

  1. uses TCP,
  2. uses multicast,
  3. forks.

Why can a forking SIP proxy not be stateless?
A forking SIP proxy cannot be stateless because it needs to perform a filtering operation, returning (in general) one response out of the many it receives. For example, a forking proxy with three branches, that receives a 200-class, 400-class, and 500-class response on each branch respectively, should return only the 200-class response upstream. If the proxy were stateless, it would end up returning all three of the responses upstream (since it won't remember that it had received prior responses when it gets another one). The result of this is (1) response implosion at the client, and (2) inconsistent responses at the client. (In this example, depending on the order the responses would be received, the client would think that the call failed, just to get a success indication some time later.) Thus, a forking proxy must be stateful.

Also note that a proxy that uses TCP must be stateful as well, whether it forks or not. This has to do with reliability issues.

Why do you want state in a proxy? Certain services (like forking) simply require it. A sequential search proxy requires state; sequential search is the heart of services like follow-me and personal mobility. It's at the discretion of the implementor whether to use a stateful or stateless proxy. You can even be "super stateful", and use the Record-Route header to allow a proxy to be on the signaling path of all subsequent exchanges. This allows a stateful proxy to maintain call state in addition to transaction state.

How does a caller find the remote SIP client of the callee?
The process is similar to the delivery of email: The caller uses the SIP host name to look up the destination host, first trying a SRV record and then "regular" DNS, just like an email client (MTA) looks up the MX record. (SRV records are generalized MX records applicable to any network service, including, but not limited to, SIP and RTSP.) For example, when contacting bell@cs.columbia.edu, the client finds a SRV record pointing to erlang.cs.columbia.edu as the SIP server for the domain cs.columbia.edu. As for email, a single domain name can resolve to multiple servers, allowing load sharing and redundancy.

The server located in this manner can then proxy or forward the call to another server.

How does SIP get through a firewall?
There are several possible approaches to SIP-capable firewalls. One of the difficulties is that, unlike for, say, HTTP, connections are originated both by hosts inside and outside the firewall. A likely arrangement is that a SIP proxy sits "on" the firewall and relays SIP requests between the Internet and the intranet. Thix proxy would also open up the necessary ports in the firewall to let audio and video flow through, for example using Socks V5.

As an alternative, if a firewall or NAT allows outgoing TCP connections, the inside client can open up a TCP connection to an outside proxy. All outgoing and incoming calls would then be handled by that TCP connection. (The client would still have to use SOCKS or similar mechanism to convince the firewall to let RTP packets through.)

How does SIP do "call progress tones" or "ring back"?
The SIP server being called, such as an Internet telephony gateway, can return any number of provisional status messages that indicate call progress. Typically, this is just 100 (Trying) followed by 180 (Ringing), but a server could produce elaborate feedback such as
100 Message received
100 Looking up number
100 Found number, looking up carrier according to profile
100 Finding cheapest carrier which doesn't do animal testing
100 Found carrier "AT&T"
100 Dialing number
180 Ringing
182 Queued, 3 people in front of you
182 Queued, 2 people in front of you

The language of the status message should be determined based on the Accept-Language request header in the call.

A 183 (Session Progress) status response will appear in RFC2543bis. It can be used for both progress tones as well as error messages.

One would use the 183 only if you:

One can also use 183 if the gateway is able to determine that an error has occured, but that there is a tone or announcement accompanying it (e.g., an ACM with a cause code present). In that case, the gateway can send a 183 to set up the media for the announcement (ideally with the announcement text as the text string), wait for a timer (on the order of 30 seconds), and then send an appropriate SIP error message.

However, this should only be done if the caller is likely a human being, as sending 183 would otherwise only delay failure handling.

Does SIP do keep-alive?
SIP itself does not have a keep-alive mechanism during the call. It was felt that loss of connectivity would be detected rapidly by the absence of media packets, typically sent at a much higher rate than any signaling keep-alive messages could be sent. In addition, the signaling path is not needed during the conversation and may well be completely different (due to proxy and redirect servers) than the media path, so that keep-alives have a limited functionality. If it is desired to test the liveness of a signaling server, it is always possible to send either OPTIONS or (re)INVITE messages.

Why does SIP not have a Content-Transfer-Encoding header?
The Content-Transfer-Encoding header was primarily meant to allow message bodies to be transformed into formats that could be transferred on channels that were not 8 bit clean. HTTP, which makes use of many of the MIME headers, is 8 bit clean, and thus did not need Content-Transfer-Encoding. SIP followed suit, and so does not use it either. Content-Encoding is used for things like compression, which is different. (J. Rosenberg)

See also RFC 2616 (HTTP/1.1), Section 19.4.5.

I want SIP to be more compact. What can I do?
First, one should realize that in general, SIP exchanges are only going to be a tiny fraction of the overall session bandwidth. A typical SIP call setup takes less than 1000 bytes, or the equivalent of one second of highly compressed (G.729) audio. Some additional space savings can be realized by using short headers. (A realistic example for an audio call setup takes a total of about 640 bytes, of which about 69 bytes are SIP headers.)

In general, more substantive savings are possible by using either payload compression (RFC 2393) or link-layer compression, e.g., at the PPP layer. For the example above, the total size is reduced to about 520 bytes with gzip compression.

Does SIP do conference control?
SIP leaves conference control, such as the election of a chair or floor control, to other protocols. SIP can be used for non-conferencing applications and floor control may be used outside the scope of SIP-initiated calls, so it seemed best to separate the functionality. However, SDP may be used to indicate which media are subject to floor control and what tools and protocols are to be used. Unfortunately, there is no IETF-standardized floor control protocol.

What is the relationship between MGCP and SIP?
The details of combining the two in a system are still being fleshed out. MGCP is a device control protocol, where a slave (gateway (MG)) is controlled by a master (media gateway controller (MGC), call agent). SIP may be used between controllers, in a peer-to-peer relationship. Note that to the SIP side, the MGC looks like a node with a large number of connections, but otherwise the same as a "native" SIP device. Similarly, the MG is completely unaware that the call between MGCs is established via SIP. Only the MGC needs to understand both protocols. Additional details.

What is SIP+ and how does it relate to SIP
SIP+ was a proposal by Level3 on how to extend SIP to interconnect two MGCs. This functionality is now being provided by various orthogonal SIP extensions, including the carriage of multipart MIME types, the INFO method and others. These are being documented in a BCP draft. The name SIP+ is obsolete and should not be used to avoid confusion.

How does SIP compare to H.323?
See H.323 comparison.

Can H.323 and SIP be used together?
Yes. SIP can locate the called party and determine its capabilities, including H.323. H.323 is then used to connect the two parties.

Unfortunately, there is currently no specification on translating between the two. Conversion is made more difficult by the multiple versions of H.323 (v1, v2, v3). However, there is at least one product (Lucent PacketStar IP) that allows SIP and H.323 terminals to call each other.

How do I interconnect Q.931 (ISDN signaling) and SIP?
A gateway that initiates an ISDN call based on a SIP call or vice versa is reasonably straightforward, as sketched in this figure.

How do I interconnect ISUP (SS7 signaling) and SIP?
SIP can be used either between SS7 nodes or to trigger a phone call in an SS7 network. While all the details have not been worked out, the basic call flow is similar to the ISDN case.

What are the different addresses in SIP?
SIP INVITE requests involve three addresses:
  1. The host address where the request came from. Responses are sent back to the same host address, regardless of what the From header indicates. Note that different requests for the same call can come from different hosts.
  2. The From address contains the logical source of the request. It remains unmodified as a SIP request traverses proxies, for example. The From address may not be the same as the host address that generated the SIP request, although that's the typical case.
  3. The session description (e.g., SDP) contains one or more addresses where the caller expects media data (audio, video) to be sent. For some services, this address may not be the same as the From address.

Can SIP be used for Internet telephony gateways (ITGs)?
Yes, in two ways. First, it can indicate to the Internet-based caller that the callee is reachable via an ITG, via the Contact header. Secondly, two ITGs connecting parties on the PSTN can signal new calls to each other, with the destination phone number contained in the request URL.

How do I put a call on hold?
The party wishing to put the other party on hold sends a (re)INVITE, with a session description containing a null (0.0.0.0) address. When used with SDP, the ``c'' address field of one or more media types is set to zero.

What is sip-cgi and how does it relate to CPL?
Both are viewed as different approaches for creating VoIP services. Both are written offline, and both are executed when messages arrive in order to execute features.

CPL is an XML-based language, while sip-cgi is a mechanism for invoking scripts or programs written in any language. sip-cgi is very similar to web cgi scripts.

In its current version, CPL is only invoked when INVITE requests and responses arrive, while sip-cgi can intercept any request.

sip-cgi is designed to be used by SIP, while CPL can probably be used by a number of signaling protocols such as Q.931 or H.323.

CPL and sip-cgi differ in their applicability. CPL is designed for end user service creation. It is intentionally limited in capabilities and is not a general purpose programming language. Its execution on a server is generally very fast. CGI is more powerful - you can do nearly anything. It is programming language independent. It incurs a process-spawning overhead, so its less efficient than CPL. (CPL is usually executed in the same process as the server). As a service provider, I would not want to execute CGI scripts sent to me by end users. However, I would prefer to use CGI to develop my own services.

Note that CGI may be used as the execution environment for a CPL script. (Jonathan Rosenberg)

Is there a SIP interoperability certification? How can I test interoperability with others?
There currently is no certification that attests to the functionality and compatibility of a SIP implementation. However, there are regular SIP bake-offs where implementors can test their work. Also, some sites have set up public SIP servers.

Where can I find more information about SIP?
(With contributions by Jonathan Rosenberg and others.)
by Henning Schulzrinne