GSM Applications,
Ports,
Others,
Half-Rate GSM,
Miscellaneous,
Indices
|
GSM 06.10 lossy speech compression
|
- telephone quality speech
- 13 kbit/s
- free sourcecode
|
|
[Recent additions are bold.]
In 1992, I was working as a tutor at the Technical University of Berlin. The research group
I was in needed a speech compression algorithm to support its multimedia
conferencing experiments.
They found what they were looking for in the
ETSI specifications
of the
Global System for Mobile telecommunication (GSM), Europe's
currently most popular protocol suite for digital cellular
phones. (John Scourias'
overview of GSM
does a good job introducing the overall architecture;
hire him.
Another, more recent,
overview of the GSM system (with a list of
Web links) comes from Javier Gozàlvez Sempere.)
The lowlevel speech compression algorithm of the GSM suite is
called GSM
06.10 RPELTP (RegularPulse Excitation
LongTerm Predictor).
My colleague Dr. Carsten Bormann and I have implemented a GSM 06.10
RPELTP coder and decoder in C.
Its source
code is freely available, and we
encourage you to use it,
play with it, and invent new realtime media protocols and algorithms.
Our implementation consists of a C library and a standalone program.
Both are destined to be compiled and used on a Unixlike environment
with at least 32bitintegers, but others have ported it to VMS and a
MSDOS 16bitenvironment.
GSM 06.10 is faster than codebook lookup algorithms such as CELP,
but by no means cheap; to use it for realtime communication,
you will need at least a mediumscale workstation.
When using the library, you create a gsm object that holds the state
necessary to either encode frames of 160 16bit PCM samples into 264bit
GSM frames, or to decode GSM frames into linear PCM frames.
If you want to examine and change the individual parts of the GSM frame,
you can ``explode'' it into an array of 70 parameters, change them there,
and ``implode'' them back into a packed frame; you can also print a
whole GSM frame to a file in humanreadable format with a single function call.
Our library client, called toast, is modeled after
the Unix compress program.
Running toast myspeech will
compress the file myspeech, remove it, and collect the result of
the compression in a new file called myspeech.gsm, while untoast
myspeech will reverse the process. The big difference
between toast and compress is that toast loses information with each
compression cycle. (After a few iterations, you can
hear highpitched chirps
that I initially mistook for birds outside of my office window.)
Patent Issues with GSM 06.10
Philips is claiming intellectual property on GSM 06.10.
They haven't contacted the authors of this library,
but at least two large
companies that wanted to integrate GSM 06.10 codecs into their
products have been approached; one decided to pull their codec,
another to pull just the encoder and leave the decoder.
(So, apparently, at least some lawyers think the intellectual
property applies only to one half of the process.)
I don't know which parts of the patent are new,
or whether it would hold up in court, but of course nobody
wants to go to court over an issue as small as this.
The VPIM IETF workgroup is considering using GSM 06.10.
The IETF can't standardize on technology that forces its
users to pay license fees.
If Philips doesn't release their intellectual property for use in VPIM,
we'll be wasting a lot of bandwidth with voice mail.
That's not the end of the world, but it would be nice to
at least ask first.
I don't know whom to ask.
If you do, please contact me.
Get ETSI publications free of charge
ETSI is the European standards body
that came up with GSM. For a limited time, ETSI
is making copies of its publications available over the
Internet for the price
of giving away an email address,
among them the GSM 06.10 and GSM 06.06 drafts
and attachments.
Try it out at http://pda.etsi.org/pda/.
GSM 06.10: the current patchlevel is 10
Shortly after releasing patchlevel 9 (which has a little more WAV#49
support in the library), a long email correspondence convinced
me that the tables used in toast's A-law conversion were entirely
fictional.
Nobody seems to use A-law much, so it is conceivable that
the wrong tables might have gone undetected since 1992. The
new tables have been independently tested against the vectors
supplied with G.726, and match those the ITU, Bell Labs, and at
least one other implementor use.
Patches relative to patchlevel 9: reference version,
8.3 filename version;
Full release (if the last thing you have is patchlevel 7,
get the full release):
reference version, gzip'ed tar file, compressed, 8.3 filename ZIP archive.
WAV49 version of gsm_implode.c is broken.
The gsm_implode() function doesn't work if the WAV49 flag is set;
as Shay Ben-David brought to my attention, the code in question
is a duplicate, rather than the opposite, of the gsm_explode() code.
This will be fixed in the next release.
Warble, warble, warble
|
Leila: Are you using a scrambler?
J. Frank: I can't hear you, I'm using a scrambler!
- Repo Man
|
|
If you're using the library to encode and decode sound in
your project, and the resulting audio is nowhere near telephony
quality but sort of warbled, the most likely cause is that
you're using the same gsm state to both encode and decode.
Don't do that; allocate two different states instead, one
for each direction.
Porting to a DEC Alpha
People porting the GSM 06.10 library to DEC Alphas have noticed
that the test of the basic math routines fails. The test prints:
0xfffffffe (4294967294) != L_<< (2147483647, 1) -- expected 0xfffffffe (-2)
0x00000000 (-4294967296) != L_<< (-2147483648, 1) -- expected 0
This can be fixed by changing the definitions of the
32-bit types in inc/private.h from long
and unsigned long
to int and unsigned int. On the Alpha, a long
has 64 bits; an int (at least with the unadorned native compiler
I used) has 32 bits.
The math tests that fail exploit properties specific to 32-bit
integer math.
If you don't care about the math test, you don't have to
change the types. In spite of the failing test, the library
does work fine even with a 64-bit long. (It's been
tested against byte-swapped ETSI test patterns.)
There is a .wav chunk format #49 that encodes GSM 06.10 frames.
Newer Windows versions support it natively. It's a completely parallel
version to ours, written from the same ETSI pseudocode, but ending up
with imcompatible framing and different code order in the bytes.
After fretting over intellectual property rights for
a few months,
Microsoft has now registered the encoding inside the WAV chunk as a
MIME type, particularly for use in the context of VPIM (Voice Profile
for Interenet Mail)'s spinoff IVM, a way of sending
Voice Messages as MIME documents.
The Microsoft ietf-draft is avalable as
draft-ema-vpim-msgsm-00.txt
from IETF draft repositories.
Long before that, Jeff Chilton
figured out the format with trial-and-error when he needed to write
compressed wave files for his shortwave radio application (see
below).
The patchlevel 9 release of GSM integrates Jeff's ``unofficial''
patch 8 in slightly different form,
breaking his sample source code along the way.
The updated version
has its GSM_OPT_WAV_FMT changed to GSM_OPT_WAV49, and (thanks
to Dima Barsky) a more portable way of looking at fputs's
result. If you couldn't get it to work earlier on a SysV-ish
environment, try again.
GSM on the World-Wide Web
Jay Novello has gone ahead and used the audio/x-gsm MIME type.
A page at the North Carolina Institute for Transportation Research and Education
explains how users and web masters
can configure their systems to conveniently handle GSM documents,
and offers a few sounds to test with for those that do.
GSM 06.10 Errata
The list of tested overflow points for sequence 1 (coder part),
table 5.2 of the GSM 06.10 draft, expects 49 overflows in the APCM
quantizer's call to abs() (section 4.2.15).
Rob Wubben of Philips
Research Labs, who implemented a GSM 06.10 codec and counted, found
57 - ditto when he checked the same count in our library, and in
a colleague's C simulation of the codec.
In our opinion the table
is wrong.
(Update: Pierre Larbier reports that the final ETSI
release of the GSM 06.10 test sequences, attached to
ETS 300 580-2 edition 2 (GSM 06.10 version 4.1.1),
has corrected its SEQ01 to produce
only the promised 49 overflows.)
Dr. Carsten Bormann
My coauthor
Carsten Bormann has left the TU Berlin a few months ago to
accompany Prof. Ute Bormann to the computer science department
of the Universität Bremen, but both still visit Berlin
regularily.
Carsten will continue to be reachable as cabo@cs.tu-berlin.de;
his email address in Bremen is cabo@informatik.uni-bremen.de.
The Schur recursion
The Linear Predictive Coding (LPC) part of the GSM algorithm
uses an integer version of the ``Schur recursion'' described
by Issai Schur in 1917. (The LevinsonDurbin
algorithm from 1959 is better known, but the Schur recursion can be
faster when paralellized.) Linear prediction
means that the algorithm tries to find parameters for a filter
that predicts the signal in the current frame as a weighted
sum (or ``linear combination'') of the previous ones.
(Wil Howitt offers a short tutorial about LPC and CELP)
Java
Kudos to Steven Pickles for
a free full-source Java 1.1 port of the GSM 06.10 Decoder side.
Unlike the C library, the Java code is licensed under the Free
Software Foundation's General Public License; if you use it,
keep the library source available.
Chris Edwards did a
Java port of the GSM 06.10 Encoder.
An open-source applet that can play lots of different GSM variants (with or without
.wav header) is MumboJumbo,
from voxeo's Omi Chandiramani. It's being extended to play other sound formats,
too, and you can help.
DOS?
Louis Selvon <lselvon@usa.net> has created a new version
of toast for DOS,
based on the Patchlevel 10 release.
As part of his EE thesis work, Louis also measured the
objective and subjective performance (not speed, quality) of
GSM 06.10 using MatLab (objective) and his family and neighbors
(subjective).
Richard Elofsson <rel@ldecs.ericsson.se> has made the
his DOSport of the Patchlevel 4 release
available. (He fixed bugs that it took me until
Patchlevel 7 to find, though.)
The source code, which compiles with Turbo C++ version 1.01,
can be found as gsm-dos.zip in the toplevel GSM ftp directory.
Sergey A. Zhatchenko (zha@ergenm.comcen.nsk.su),
from Novosibirsk, Russia, has donated a toast.exe, derived from a patchlevel 6 release of the GSM
library. Make sure your input filenames have no suffix;
this version of toast doesn't know that MS-DOS doesn't like more
than one dot in its filenames.
GSM on the BeBox
Pierre-Emmanuel Chaut ported the GSM library to the
BeBox, a PowerPC-based
multiprocessing platform that excels
with concurrent multimedia applications.
It takes, he writes, "4 seconds to compress 20 seconds
of sound". Way to go, Be.
Jake Bordens did his own port and implemented some
GSM Coders as minimal sample applications. He's
still too embarrassed to publish his code to just
about anyone, but might be talked out of it; meanwhile,
the binary is available from the webpage.
GSM DLL for OS/2
Terry Fry created, and is now distributing and maintaining,
a OS/2 DLL version of the GSM 06.10 library.
Next will be a .wav to .gsm for OS/2.
MacGSM
Paul C.H. Ho and Pink Elephant Technologies have used the Patchlevel 6
release to write a drag-and-drop
GSM compressor/decompressor
that converts between .au.gsm and .au. You'll need System 7.5 or
System 7.0 and 7.1 with a Thread Manager extension; 68K and Power PC
hardware is fine.
The tool, initially written to decompress files broadcast by
Radio Television Hong Kong
is freeware and is distributed via
ftp as a binary.
GSM for the amiga
Michael Cheng is responsible for the distribution of
a
toast binary compiled with amiga gcc2.7.2
on the aminet repositiories, path
util/pack/GSMToast.lha. Michael also added some scripts
that use toast to implement a
streaming audio GSM mime type; they can
be found on the same archives in comm/tcp/unrealaudio.lha.
GSM for GBA
Damian Yerrick ported part of the library to the Game Boy Advance as
part of a portable music player application that plays music
off 256 Mbit flash cards.
xine
The GPL'ed free video player xine
now uses code from our library to help play GSM-enocded AppleTalk and
Windows WAV/AVI/ASF audio tracks.
aRts
The KDE sound server aRts, short for
analog realtime synthesizer,
has grown a GSM de- and encoder in its kdenonbeta module, thanks
to Matthias Kretz.
JusTalk 2
Jonas Tärnström released this compact
Windows multiuser voice chat application. It supports multiple
sample rates, can function as a client or server, and can be set to
stream audio either contiguously or whenever the voice level rises
above a threshold.
ElderVision
The makers of the TouchTown
Internet package for seniors are using a Java GSM 06.10 client
for low-bandwidth telephony.
linphone
Even if you don't speak French, you can now read about and download
linphone, a web-phone application
that uses the GSM 06.10 library (with a fresh autoconf Makefile from author Simon
Morlat).
JVOIPLIB
Jori Liesenborgs's
JVOIPLIB
is a LGPL'ed voice-over-IP library written in C++, based on his thesis work.
It supports multiple codecs and codec parameters, VoIP session creation and
destruction, and 3D effects (!).
Jori has just integrated the GSM library and will likely be shipping a
subset of the GSM 06.10 release with his next version.
OpenH323
OpenH323
is an Open Source implementation of the ITU H.323 protocol stack which
runs on Linux, Windows, Solaris and other Unix platforms.
The OpenH323 client sample code can interoperate with NetMeeting in
audio mode, and can receive H261 format video. The GSM codec is the
standard codec used by Linux implementations where G.723.1 hardware is
not available.
Patches to the SOund eXchange tool, sox
Andrew Pam
(avatar@aus.xanadu.com)
has
patched Lance Norskog's
sox program to work with the GSM
library. I wish I had thought of that.
Sox-12.16: Son of SOX
Chris Bagwell (you might remember him as maintainer of the
Audio File Format FAQ) has snatched maintenance of the
cryptic, resourceful Unix tool sox from its original author,
Lance Norskog.
Version 12.17 supports GSM
and WAV#49.
Pulse Entertainment's 3d web animation plugin
Pulse3d
is streaming GSM 06.10 audio to its real-time animated characters,
along with the lip sync and and body animation information that
makes them come to life.
HotFoon
People with friends in Hyderaband, India, are in
luck; hotfoon
is offering a (so far) free gateway service to numbers in the
local area there. Their small, free client also serves
as a gateway to an online chat system; as usual, if you
and a friend both download the client, have Duplex sound cards and
a reasonably fast Internet connection, you can talk for
free across the Internet, no matter where you are.
ATR-ITL
Somewhere towards the tail fin of the Japanese-English
telephone "babelfish"
that the Advanced Telecommunications Research group's Interpreting
Telecommunications Research Laboratories
are trying to build, a GSM 06.10 codec is one of the options
available for encoding the translated utterances.
The Audiograph Lecture Recorder and Player
The University of Surrey, UK, and
Massey University, NZ,
have developed a Mac-based authoring system and Windows/Mac Netscape
plugin software for voice- and drawing-annotated
slide shows; they now distribute it through
www.nzedsoft.com.
The viewers are free; version 1.2 of the authoring tool
used to cost money, but is now free as well.
NTT's "InterSpace" Virtual Environment
The Virtual Campus of NTT's
InterSpace project
combines videoconferencing with 3D graphics and, recently added,
an audio chat facility that uses our library.
The site's
entrance graphics show rendered avatars whose heads are
replaced by video screens rendered into the scenery, rather
ingeniously close to the SnowCrash ideal.
Vosaic streaming audio applet
In the long term, the young Illinois startup
Vosaic tries to compete
with Progressive Networks in the streaming video market.
Right now, they're showing a GSM-streaming Java applet based
on Avneesh Pant's work.
FreedomAudio - Streaming Audio Player
Rolande Kendal has written the beautifully minimalistic set
of controls that is free for non-commercial use.
The FreedomAudio Java Applet can be used with a Java or
JavaScript user interface and supports MS WAV #49 GSM by default;
plain GSM available on request.
1-Step Audio Publisher Version 2.x
Noël Bouchard's
GSM
player/converter for Windows supports plain WAV, Sun AU,
GSM 6.10 as understood by toast, WAV #49 GSM, and TrueSpeech.
GSM to WAV, the second
Bill Neisius (neisius@netcom.com) sent
me email about a GSM-to-WAV converter and Web client he wrote a
while ago. Soon after it arrived, the email fell prey
to a temporary shortage of disk space on our system; it didn't
get deleted, it just got written to the Place Where I Never
Look. Well, I looked there just now, and if you're
lucky enough to be able to access
Bill Neisius'
ftp directory at netcom, you might find lots of interesting
sound applications there, some of which convert GSM to plain WAV.
QuickView, the DOS based multimedia viewer
Version 2.3 of QuickView
supports GSM 06.10 and a host of other video and audio formats.
The viewer is shareware that comes with a
three-week free evaluation period; if you're interested in licensing
the libraries or building custom viewers, contact Wolfgang Hesseler
at qv@multimediaware.com.
Gir: A realtime player for Amiga OS
Sinisa Kesic
has developed
a small realtime player for Amiga OS named "Gir"; it comes with
a browser-like interface for playing music locally or from the net.
Included in the package (which can be found in tcp/Gir??.lha
in your local Aminet archive) are tools for converting between Amiga
raw 8-bit iff samples and GSM, and a "littlegir" plugin
for webbrowsers.
XAnim
Mark Podlipec has integrated support for GSM audio into his
XAnim, an
animation, video, and audio player running under X on Unix and VMS.
SoftFone
And yet another product starts its description with
``Now you can...'', as if IGP, WebPhone, DigiPhone,
CyberPhone, and whatever they are called had never happened.
SilverSoft's SoftFone shines with a built-in answering
machine, voice mail, and variable rate compression; other
than that, it's the usual full duplex point-to-point Internet
phone deal.
IVS
Thierry Turletti's INRIA Videoconferencing System
transmits video and audio data
between camera-equipped Unix workstations on the Internet.
It supports a number of different audio codecs (among them GSM 06.10)
and a H.261
video codec that is packetized with the increasingly
popular RTP.
V-Fone
Bob Summers brings us V-Fone, a flexible,
low-end videoconferencing application for PCs running
Windows '95. (It seems to be point-to-point right now,
with broadcast just around the corner.)
The Internet Party Line
Intel's experimental Windows application is no
longer supported by its creators, but some of its users
still distribute and use the binaries.
As the name suggests, this is
a real-time, multi-party audio chat via the Internet.
Closely modeled after text chats like IRC, the application queues
each speaker's statements separately and plays
them serially, allowing any person to talk at any time
without interrupting the others. Any Internet-connected
PC can become a server; both client and server binaries
are publicly available.
CyberPhone
(I guess someone had to come up with that name...)
Matt Krokosz and Greg Foglesong present version 2
of CyberPhone,
an Internet phone application that runs on Sun workstations
and is being ported to Linux PCs.
The system comes with an (optional) user directory service running
on magenta.com; the full version costs $20, the demo (with
2 minutes of connect time through the central server only) is free.
Speak Freely
Brian C Wiles has been breathing new life into John Walker's
Speak Freely,
an Internet phone that runs on SGIs, Sun SPARCstation,
and (with WINSOCK) on Windows. The tools interoperate
seamlessly and can encrypt their voice data streams
with IDEA, DES, PGP, and/or a one-time pad.
Source code is freely available for both the Unix and
Windows release. Version 8.0, now in beta under Windows,
features a multipoint conference mode, answering machine messages,
and easier interoperation with ICQ.
PCS 1.0 (?)
This isn't really an application, but there is, or used to be, a
strongly Intel-influenced
industry consortium called the
Personal Conferencing Working
Group (PCWG) which defined
something called the Personal Conferencing Specification (PCS) -
yet another desktop video conferencing infrastructure -
and, according to Leigh Anne Rettinger's thesis,
the first version of it included GSM audio compression.
I can't find a trace of these people after 1997; if anyone
knows the story of what happened to them,
send me email.
xztalk, ztalk
The Linux ``xztalk'' by Liem Bahneman
(roland@cac.washington.edu) and Andy Burnett
(burnett@baldrick.cecer.army.mil) is based
on Scott ``This is so incredibly alpha, it isn't funny''
Doty (scott@cs.santarosa.edu)'s extended version of
misch@elara.fsag.de's ``mtalk''. W. Richard Jhang
(feinmann@cs.mcgill.ca)'s ztalk
is also a descendant of Scott Doty's release; I don't know
whether xztalk used ztalk, or whether both were developed
independently. Contact your friendly
sunsite mirror for details.
erikyyyphone
Named after the author's IRC nick, ericyyyphone is a GPL-licensed audio conferencing application
written in C++, running on Linux.
Microsoft NT and Windows 95 (beta)
Microsoft's Audio Compression Manager includes a
GSM 6.10 CODEC (in addition to those for ADPCM,
IMA ADPCM, the DSP Group's
TrueSpeech(TM), and
a PCM converter).
The Windows 95 beta
added CCITT G.711 u- and A-law CODECs to the collection.
Microsoft's GSM 06.10 CODEC is not compatible with toast's
frame format - they use 65-byte-frames (2 x 32 1/2) rather than
rounding to 33, and they number the bits in their bytes from
the other end. (Well done, guys.)
SoundApp for Macs
Norman Franke's
SoundApp plays as many audio formats on the Mac as he
could get his hands on, among them GSM 06.10
(both ours and Microsoft's). Keeping with the flexible theme,
the application has been translated into Japanese, French, and Swedish.
WebbWatch for Windows
WebbWatch for Windows, by Daniel Ding,
turns a Pentium with VideoBlaster-compatible capture card and
the usual sound support into a video phone.
The video codec does H.261's QCIF, the audio is GSM or ADPCM.
VidCall from M R A Associates, Inc.
VidCall is a video and audio player and recorder,
combined with a multipoint shared clipboard application,
for Windows.
It uses aforementioned Microsoft GSM 06.10 CODEC
for its audio; if you can't find yours, VidCall's public
ftp directory has a replacement zipfile with an MSGSM610.ACM.
The software is distributed via the Internet; restricted use
during a 30-day evaluation period is free.
Internet Global Phone
Around December 1994, a company called microWonders, Inc.,
released source code for a GSM-using tool called
``Internet Global Phone'' and publicised the event
with a press release that suggested I was
distributing their tool.
(Longer version.)
The Internet Multicasting Service
The Internet Multicasting
Service has been broadcasting audio on the Internet
for more than two years, starting with the ``Geek of the
Week'' program in March 1993. In addition to
its original .au format, it now supports .ra (Real Audio) as
well as .gsm.
vat - LBNL Audio Conferencing Tool
Vat was developed by
the Lawrence Berkeley National Laboratory's Research Group.
It is part of a whole set of tcl/tk applications grouped around
IP multicasting on the MBONE (but functional without it).
With the most recent 4.0 alpha release, source code is finally
available; so are, as before, binary distributions for most
Unix platforms.
NVAT - Network Video Audio Tool
NEC Corporation's NVAT
implements video and audio conferencing
with less than 64 Kbps bandwidth on PCs. To receive and
send video, you'll need a i486 DX4/100MHz or Pentium 75MHz or faster
running Windows NT,
with at least 32 MB of memory,
a VGA video adapter with at least 256 colors (it's faster with 65,536),
a SoundBlaster16 card or similar, and a video card that works
with the Video for Windows API. Some of these requirements can be
dropped if video is only received, not sent.
The tool is compatible with versions
of the Unix-based nv and vat, and can receive
MBONE broadcasts. The
binary-only alpha release is free for research and
evaluation purposes.
Nevot 3.34 (December 22nd, 1995)
Henning Schulzrinne's network voice terminal program NeVoT provides packetvoice communications across
internetworks. It operates in either unicast,
simulated multicast, or IP multicast environments, using the
vat or
RTP
protocols.
NetPhone, DigiPhone, Digifone, e-Phone.
San Francisco's Electric
Magic Company has renamed their NetPhone
Internet phone application to e-Phone; the name NetPhone
was already taken.
They apparently plan to continue changing the name to
Digifone with the Macintosh release, which is
very likely a typo for
DigiPhone, whose vendor Third Planet Publishing
claims they bought e-Phone in 1995, which would
be before the name change.
WebPhone
NetSpeak Coprporation, formerly the "Internet Telephone Company,"
has released version 4.02 of their WebPhone application.
The application,
which requires a PC running Windows 3.1 or higher and an
MCI-compliant sound card, supports TrueSpeech(TM), G.723.1, G.711 (that's 8 kHz u-law),
and full-rate GSM.
CU-SeeMe
I have been told that CU-SeeMe for MacIntosh computers
supports GSM encoding in some manner. The Web resources
list a mysterious new 16 kb/s encoding that ``should work over
a 14.4 line'' (the incredibly shrinking compression method!),
but I don't know anything specific.
InPerson
SGI's
multimedia conferencing tool
(technical details) offers GSM 06.10 encoding as
one of six options for the audio stream.
(I do not know who wrote their codec.)
The software requires an SGI workstation with at least 32 Mb RAM
running IRIX 5.2 or above, and can be downloaded via ftp
for a free 30day trial.
UnReal Audio
As a take-off on RealAudio (see below), Roman Mitnitski
(mitnits@shani.net) has implemented a simple real-time server/client
for Linux based on the GSM library.
The pre-release is freely available via ftp,
but you'll need to bring your own XForms library and GSM library.
PowWow
Collaborative browsing is the speciality
of Tribal Software's Windows-only
PowWow tool. Users chat through text
and, if their hardware and Internet connection allows, on a
14,400 bps voice line.
MBONE protocols for Windows
Precept software,
a Palo Alto startup, sells software that supports the dominant
Internet conferencing protocols RTP and RTCP. (RSVP is being
worked on.) Together with access to a MBONE-connected Internet
host and Precept's H.261 codec, that's enough to both play data
from the MBONE and broadcast to it. These capabilities are
sold both as `stand-alone'' software and as a convenient library
for developers wishing to integrate multimedia Internet
communication into their applications.
Voxware
The Princeton, NJ startup Voxware
specializes on vocoder software that, according to their own
descriptions, allows for stunning compression rates without
requiring dedicated hardware. They offer a complete product
palette of speech codec applications, from the inevitable Internet
telephone over browser plugins to a voice parameter editing
system.
RealAudio
The selling point of Progressive Network's
RealAudio is not
its format, but its flow control: the streams start playing
immediately.
Users don't have to wait for
the whole document to arrive, and they can interact with
the data stream (jump to different tracks,
change channels).
Internet Phone
VocalTec Inc., from Northvale, NJ, sells an
Internet Phone
application for PCs that uses some unspecified ``unique voice
compression algorithm'' to compress down to about 7.7Kbit, and,
if you buy their compression card, even down to 6.72 Kbit.
Because of the attention the company paid to community infrastructure
(initially leading to the demise of the IRC servers they used to
support their directories), Internet Phone has become a ``scene''
similar to that of IRC.
Internet Wave
Internet Wave is VocalTec's answer to RealAudio.
Unlike RealAudio, which consistently operates at a bandwidth
around 14,400, VocalTec's system supports four different audio
qualities, at bandwidths between 9,600 and 28,800 baud.
Will VocalTec's existing market base and the possible better
sound quality suffice to dislodge RealAudio from its already
established market position? And will either of them
manage to move my headphone plug from its established
position in the CD player? Not as long as
there's Shriekback on, it won't. Stay tuned.
Enhanced Full-Rate GSM
On November 4th, Nokia announced that the EFR (enhanced full rate)
codec they had been developing with the University of
Sherbrooke, Canada, had been chosen by the ETSI
as the industry standard codec for GSM/DCS.
Additionally, the US PCS 1900 operators have also moved
to EFR. It's supposed to have ``landline quality,''
be ``more robust to non-voice signals such as music'' and
more resilient to ``environments with excess background noise''.
Anyone know more about this?
Half-Rate GSM
According to an article posted to comp.dsp by
Texas Instruments' Mansoor Chishtie,
-
GSM half-rate is now a standard.
It is based on Motorola's VSELP technology similar to
IS-54 full-rate.
It compreses speech at 5.6kbps using two
7-bit codebooks for unvoiced speech and one 9-bit
codebook for voiced segments.
The draft prETS 300 581-2 (GSM 06.20 Version 4.0.0)
is the mathematical description of half-rate GSM.
Good question.
According to a posting to comp.dsp from Feb 18 1995 by Chris Cavigioli, back then of Analog Devices, Inc.,
they have ``joined
Alcatel Radiotelephone, Nokia, and Italtel-SIT in a subgroup to evaluate
the complexity (MIPs and memory) required of typical 16bit DSPs, based on
bitexact ANSI C programs supplied by Motorola and ANT Bosch (the two final
codec candidates)''; their results have been published in three
places:
- DSPx '94 Proceedings (theoretical worst case complexity)
- DSPWorld '94 ... also known as ICSPAT '94 Proceedings (avg complexity)
- Wireless Symposium '95 Proceedings (compare ETSI vs. ADI DSP complexity)
Analog Devices have ``implemented the GSM half-rate standard in DSP assembly
code, running in real-time, and meeting the ETSI delay
specifications.''
(Of course, this says very little about what will be possible
in nonDSP software.)
In the proceedings of the September's EUROSPEECH'95 in Madrid,
Tim Fingscheidt, T. Wiechers and E. Delfs have published a
paper on ``Implementation Aspects of the GSM Half-Rate Speech Codec''
(pp. 723/726). Tim, whose group implemented a half-rate
codec for the NEC PD77018 based on the 06.06 source code,
estimates the complexity of the half-rate codec at 4-6 times that of
the full-rate version.
GSM 06.06: sourcecode for GSM 06.20
GSM 06.06 is ANSI C source code for a halfrate codec.
Its public review period started on April 10th, 1995.
``Public review'' means that it is for sale as
-
draft ETS 300 581-7
from the ETSI sales department,
-
Ms. Anja Mulder
+33 92 94 42 58 (voice)
+33 93 95 81 33 (fax)
At the moment, it doesn't seem as if we're going to implement
GSM 06.06 here.
The test patterns for GSM 06.06, GSM 06.07, will become
draft ETS 300 581-8, and lag the source code by about two
weeks.
06.42 is half-rate voice activity detection, 06.22 comfort noise.
GSM 06.06, lesson I: the bait and switch
When a colleague recently told me he had ordered draft ETS 300 581-7
and would lend me his copy, I was looking forward to examining the
code in GSM 06.06 and judging its complexity myself.
What he turned out to receive from ETSI was
only a call hierarchy of the functions in the code;
the discette that was listed as
``attached to the back cover'' was missing.
After complaining to ETSI, they told him that he could
buy the actual source code for 1000 ECU. (1 ECU = US$ 1.28 or DM 1.85).
Mind, this is probably not attributable
to malice - I hear that the electroniconly distribution format
is still new to the ETSI bureaucracy, and it is likely that they
have trouble adapting their pagebased pricing scheme
to that.
Nevertheless, if you or your company order ETS 300 581-7, make clear
that you do want source code, and make the person at the other
end list the price for the code; don't buy expensive, but useless,
crossreference listings.
[Update: Currently, ETSI documents - including the source code
mentioned above - are available free of charge over the net.
Go to http://webapp.etsi.org/pda/.]
On digital speech processing, I recommend
DiscreteTime Processing of Speech Signals
by John R. Deller, JR,
John G. Porakis,
and John H. L. Hansen;
Macmillan Publishing Company, New York, 1993;
ISBN 0023283017
For a wellwritten, interesting,
100% jargonfree introduction to
language, speech, and the mind, see
The Language Instinct
by Steven Pinker;
William Morrow and Company, Inc., New York, 1994;
ISBN 0688121411
The book on GSM in general is self-published and can only be
ordered from the authors.
Introductions and Demos
A set of introductory DSP classes is online
at http://www.bores.com/courses/intro.
If you're learning about digital speech processing, visit
Phil Karn's Digital/Analog Voice Demo at
Qualcomm.
Illustrated with mu-law sound samples,
Phil takes you from an original sound sample,
to a band-pass filtered version,
to one with added noise, to a GSM version,
a CELP-encoded version, Qualcomm's proprietary
QCELP-encoded version at two different data rates,
and an LPC-10 version, complete with running
commentary about each encoding.
CELP source code sighted
Rick Ross found a set of
speech compression engines at CMU; featuring a prehistoric
version of GSM, an LPC, the CCITT-ADPCM, and various
*ELPs that I haven't seen anywhere else.
Samples
The Vincent
Voice Library at
Michigan State University
houses taped utterances of over 50,000 persons recorded
over 100 years.
In addition to the standard 20 8-kHz mu-law samples (among them
Isaac Asimov at MSU on writing books (2:39)),
the site currently features an exhibition of voices of
presidents from Grover Cleveland to Bill Clinton.
The inexhaustible Jennifer Myers maintains a list
of sites with audio clips.
A short tutorial on Cursing in Swedish is richly illustrated with wave
file samples.
Sound applications on the WorldWide Web
Periodical sounds:
the Weekly
Idiom from the Comenius Group.
The US National Institute of Health's amateur radio club
broadcasts traffic as UDP packets containing GSM 06.10 audio
from the Listening Post. (You'll need a UDP-based
GSM 06.10 audio decoder to receive, such as Speak Freely).
Jeff Chilton's Shortwave
Radio gives you access to the last 5 or 15 seconds from a
user-selected frequency, as received in Reston, Virginia, USA.
``Bluedog can count!''
Voices is a Web interface to AT&T's
text-to-speech engine. You can type and
then hear spoken sentences with up to 40 words; a second,
more detailed interface lets you customize pitch,
head size, word rate, and aspiration of the generated
sample.
Research
Voice Synthesizers On The Verge of a Nervous Breakdown: in 1989,
Janet Cahn wrote her thesis at the MIT Media Lab about
Expressive Synthesized Speech - how to make voice
synthesizers express emotions. The three sound
samples she has online, three different sentences synthesized
in ten tones expressing anything from impatience through anger
to depression, are still hilarious to listen to.
T.V. Raman's auido rendering system for mathematics
ASTeR
uses parameters of synthesized speech to indicate
nesting and dependencies within formulas; culminating in
an impressive 66-second audio
rendering of Faà di Bruno's formula.
(Knuth Vol. 1, bottom of page 50.)
The European
Speech Communication Association (ESCA) keeps a page of
links
to speech research institutions all over the
world. (You might want to turn off image loading
for this page; every link is illustrated with the
institution's logo.)
Fun
Proceed to the next stage of collaborative technology
with the RealAroma
server and, uh, plug-ins.
If you want to learn more about sounds, why not pay a visit to the
San Francisco Exploratorium
and its duck call vowels?
(If, conversely, you want to hear more about toasters, I recommend
Patrick R. Michaud's report on Strawberry PopTart BlowTorches)
The final word on telephone sex.
The maintainers of the following sites try to offer
comprehensive and complete indices into their respective
subjects; the documents should be large enough
to get you within a few hops of your topic quickly.
Multimedia
Simon Gibbs'
Index to Multimedia Information Sources
A long no-frills list of Multimedia links, with archives,
standards, companies, research organisatins,
conference announcements, tutorial-type material, and FAQs.
Internet telecommunication
Audio and Video via the Internet from Jack Decker.
A long list of links sorted by type of application or institution:
audio players; distributors of audio media;
two-way audio; two-way audio/video; hardware;
and miscellaneous links.
Voice/Video on the Net
Where the previous site has short descriptions, Jeff Pulver's
restricts itself to links; but it is embedded in a rich subtree
of media related information that makes up for the main page's
brevity. The NetWatch archives in particular have short
notes and updates on what must be every PC Internet audio tool
in existence.
How Do I Use the Internet as a Telephone?, from Kevin Savetz and Andrew Sears
Preceded by a loose and fast introduction to the basic concepts
of Internet telephony strictly from a user's perspective,
most of the FAQ is taken up by short reviews of
Internet Telephony products, grouped by platform.
Speech Processing
Andrew Hunt's
comp.speech WWW site (Cambridge mirror)
The site's hypertext version of the comp.speech
Frequently Asked Questions posting has pointers to general
information and tools concerned with speech encoding, compression,
recognition, synthesis, and other forms of natural language
processing.
Jason Woodard's descriptions of
Speech Codecs
Rather than pointing to every speech processing gizmo in existence,
this subtree explains principles and formats, and gives crucial
software and theory references, for three general
classes of speech codecs and a the most important standards.
Digital signal processing
The comp.dsp Frequently Asked Questions list
Questions, answers, and resources for general digital
signal processing.
Josip Juric's DSP
homepage
collects the FAQ and a number of other pointers to DSP resources;
among them Guido van Rossum's Audio File Format FAQ
and Appendix from comp.dsp.
Compression
The comp.compression Frequently Asked Questions list
explains, and often provides references to software that implements,
most lossy and non-lossy algorithms. The hypertext FAQ
archived at Ohio State University looks just like the ascii FAQ,
but has been broken up and links directly to referenced documents
where possible.
Telecommunication
Telecommunication resources
from the Australian Telstra
is a big page of commented
links.
Telecom Information Resources from Jef MacKie-Mason,
a searchable list of references to
technical, economic, public policy, and social aspects
of telecommunications.
Telecommunication sites from John Scourias.
John is the author of the excellent overview referenced
elsewhere on this page; this is his telecommunication hotlist.
Digital mobile telephony
Simon Hewison's FAQ on Digital Mobile Phones
lists service providers and manufacturers of digital cellular
phones. If you're trying to find the difference
between, say, GSM and PCN, or want to know exactly how many
mutually incompatible CT2 networks there were in the UK,
this is the place to look.
Jürgen Morhöfer's GSM List,
last updated on Sep 22th 2000, lists
GSM operators with network code and customer
service phone number, sorted by country.
Supercall Cellular,
a South African provider, maintains a page of links to general
information about GSM, including codes, networks,
coverage maps for Europe -- and a request for submissions
of scanned-in SIMs.
jutta@pobox.com, July 2000.
Comments and corrections are welcome.