Ugly speech synthesis in C

For quite a few years now, I’ve been teaching C programming to novice programmers. As I’m sure many other people who teach C have done, I got really tired of using the same old text-based example programs. The sort of thing I mean is:

  • Use scanf to read in some values from the user,
  • Do a calculation of some kind,
  • Display the result using printf,
  • ZZZzzzzzzzz…

Over the years, I’ve been guilty of using countless awful examples like that. I always want to make the examples (and assignments) more interesting, but I also need to keep things simple. In particular, I always want to liven things up by producing some kind of graphical or multimedia output, but using only vanilla C and the standard C library. I’m very reluctant to muddy the waters by introducing extra libraries or APIs.

One of the first really useful solutions I came up with is processing PGM image files. This file format stores a greyscale image as a simple table of pixel values in a plain text file. Using this format, it’s no problem at all for novice programmers to tackle simple image processing problems using the familiar file input functions. Using fscanf in a nested loop is all you need to load the pixel data from a PGM file into a 2D array. Similarly writing a PGM image file using fprintf is almost trivial to do.

I have also experimented with creating SVG images (e.g. graphs of signals) from simple C programs using fprintf. It actually works really well and you can view the resulting image in most web browsers. The C code is quite short, but a little messy looking, so I haven’t asked my students to use this method so far.

Anyway, on to the main topic of this post:

This evening, I’ve been playing around with a little Linux application called aplay, which is part of the ALSA collection of audio drivers and utilities. aplay is basically a command line audio file player, but what I love about it is that it can play samples in real time from standard input. This means you can generate an audio signal in real-time in a vanilla C program and just output the samples via the standard output, then pipe them into aplay. In this way, without having to mess around with any audio libraries or APIs, it’s really easy to begin generating audio in real time from a simple C program!

Below is a simple example. aplay can deal with a wide range of formats and sample rates, but by default it expects unsigned 8-bit samples at sampling frequency of 8kHz on its standard input, so that’s what I’m going to give it here.

//
// squarewave.c
// Written by Ted Burke
// Last updated 20-9-2012
//
// To compile: gcc squarewave.c
//

#include <stdio.h>

int main()
{
	int n;
	unsigned char c;

	for (n=0 ; n<8000 ; ++n)
	{
		c = 120 + 16 * ((n%16)/8);
		fwrite(&c, 1, 1, stdout);
	}

	return 0;
}

The command to compile the above example is just “gcc speech.c”, which produces an executable file called “a.out”. To actually run the example and hear the squarewave audio signal played out loud, simply run it, piping the output into aplay.

./a.out | aplay

Here’s a more complex example which generates some rudimentary speech sounds.

//
// speech.c
// Written by Ted Burke
// Last updated 20-9-2012
//

#include <math.h>
#include <stdio.h>

#define PI2 6.2831853

//
// The following phoneme harmonic weightings are loosely modelled on
// the 325Hz female example from this web page:
//
//		http://hyperphysics.phy-astr.gsu.edu/hbase/music/vowel.html
//
// Each array stores the respective weightings of the
// first 12 harmonics of one speech sound. The first
// element in each array is a noise weighting. The
// second element in each array is the weighting of
// the fundamental.
//
double ahh[] = {0.000, 0.100, 0.250, 1.000, 0.150, 0.030, 0.020,
						0.005, 0.000, 0.000, 0.000, 0.000, 0.000};
double eee[] = {0.000, 1.000, 0.050, 0.025, 0.000, 0.000, 0.000,
						0.025, 0.125, 0.150, 0.000, 0.020, 0.020};
double ooo[] = {0.000, 1.000, 0.250, 0.080, 0.020, 0.020, 0.000,
						0.010, 0.000, 0.010, 0.000, 0.010, 0.000};
double sss[] = {0.300, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
						0.000, 0.000, 0.000, 0.000, 0.000, 0.000};
double off[] = {0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000,
						0.000, 0.000, 0.000, 0.000, 0.000, 0.000};

// Function prototype
void phoneme(double *h1, double *h2, double f0, int N);

double Fs = 8000.0; // sampling frequency
int n = 0; // global sample counter

int main()
{
	// Play some speech sounds
	phoneme(off, sss, 325.0, 100);
	phoneme(sss, sss, 325.0, 1000);
	phoneme(sss, ahh, 325.0, 500);
	phoneme(ahh, ahh, 325.0, 8000);
	phoneme(ahh, sss, 325.0, 500);
	phoneme(sss, sss, 325.0, 1000);
	phoneme(off, off, 325.0, 8000);
	phoneme(sss, sss, 325.0, 1500);
	phoneme(sss, eee, 325.0, 500);
	phoneme(eee, eee, 325.0, 7000);
	phoneme(eee, off, 325.0, 1000);
	phoneme(off, off, 325.0, 8000);
	phoneme(sss, sss, 325.0, 1500);
	phoneme(sss, ooo, 325.0, 500);
	phoneme(ooo, ooo, 325.0, 8000);

	return 0;
}

//
// This function either outputs one particular speech sound,
// or transitions from one sound to another over the specified
// duration (in samples). h1 is the starting speech sound.
// h2 is the finishing speech sound. To play just one speech
// sound, specify the same array for h1 and h2.
//
void phoneme(double *h1, double *h2, double f0, int duration)
{
	int i, j;
	double s; // used to calculate each sample
	double f1, f2; // cross-fade weighting factors
	unsigned char c; // used to store sample as byte

	for (i=0 ; i<duration ; ++i)
	{
		// increment global sample number
		n++;

		// Update cross-fade weighting factors
		f1 = (duration-i)/(double)duration;
		f2 = 1-f1;

		// Add noise component to new sample
		s = (f1*h1[0] + f2*h2[0]) * (rand()%100)/100.0;
		if (rand()%2) s = -s;

		// Add harmonic components to new sample
		for (j = 1 ; j<=12 ; ++j) s += (f1*h1[j] + f2*h2[j]) * sin(PI2 * j * f0 * n / Fs);

		// Scale new sample and convert to unsigned char
		s = 128.0 + (127.0 * s / 13.0);
		c = (unsigned char)s;

		// Output byte to stdout
		fwrite(&c, 1, 1, stdout);
	}
}

To compile the above example and run it (piping the output into aplay, use the following commands:

gcc speech.c -lm
./a.out | aplay

Here’s what the output “speech” sounds like:

sass_see_soo.wav (92.1 KB)

 

This entry was posted in Uncategorized and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s