SIP, the session initiation protocol, is the IETF protocol for VoIP and other text and multimedia sessions, like instant messaging, video, online games and other services.
Abstract from the RFC 3261 (formatted_and_explained version) – SIP: Session Initiation Protocol
Since SIP is a flexible protocol, it is possible to add more features and keep downward interoperability.
SIP also does suffer from NAT or firewall restrictions. (Refer to NAT and VOIP)
SIP can be regarded as the enabler protocol for telephony and voice over IP (VoIP) services. The following features of SIP play a major role in the enablement of IP telephony and VoIP:
Feature Negotiation: This allows the group involved in a call (this may be a multi-party call) to agree on the features supported, recognizing that not all the parties can support the same level of features. For example video may or may not be supported; as any form of MIME type is supported by SIP, there is plenty of scope for negotiation.
Call feature changes: A user should be able to change the call characteristics during the course of the call. For example, a call may have been set up as ‘voice-only’, but in the course of the call, the users may need to enable a video function. A third party joining a call may require different features to be enabled in order to participate in the call
Media negotiation: The inherent SIP mechanisms that enable negotiation of the media used in a call, enable selection of the appropriate codec for establishing a call between the various devices. This way, less advanced devices can participate in the call, provided the appropriate codec is selected.