Soundex (C)
From LiteratePrograms
This program is a code dump.
Code dumps are articles with little or no documentation or rearrangement of code. Please help to turn it into a literate program. Also make sure that the source of this code does consent to release it under the MIT or public domain license.
/*
* This implementation of the Soundex algorithm is released
* to the public domain: anyone may use it for any purpose.
*
* N. Dean Pentcheff
* 1/13/89
* Dept. of Zoology
* University of California
* Berkeley, CA 94720
* E-mail: dean@violet.berkeley.edu
*
* char * soundex( char * )
*
* Given as argument: Pointer to a character string.
* Returns: Pointer to a static string, 4 characters long, plus a terminal
* '\0'. This string is the Soundex key for the argument string.
* Side effects and limitations:
* Does not clobber the string passed in as the argument.
* No limit on argument string length.
* Assumes a character set with continuously ascending and contiguous
* letters within each case and within the digits (e.g. this works
* for ASCII and bombs in EBCDIC. But then, most things do.).
* Reference: Adapted from Knuth, D.E. (1973) The art of computer
* programming; Volume 3: Sorting and searching.
* Addison-Wesley Publishing Company:
* Reading, Mass. Page 392.
* Special cases:
* Leading or embedded spaces, numerals, or punctuation are squeezed
* out before encoding begins.
* Null strings or those with no encodable letters return the
* code 'Z000'.
* Test data from Knuth (1973):
* Euler Gauss Hilbert Knuth Lloyd Lukasiewicz
* E460 G200 H416 K530 L300 L222
*/
#include <string.h>
#include <ctype.h>
char *soundex(const char *in)
{
static int code[] =
{ 0,1,2,3,0,1,2,0,0,2,2,4,5,5,0,1,2,6,2,3,0,1,0,2,0,2 };
/* a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z */
static char key[5];
register char ch;
register int last;
register int count;
/* Set up default key, complete with trailing '0's */
strcpy(key, "Z000");
/* Advance to the first letter. If none present,
return default key */
while (*in != '\0' && !isalpha(*in))
++in;
if (*in == '\0')
return key;
/* Pull out the first letter, uppercase it, and
set up for main loop */
key[0] = toupper(*in);
last = code[key[0] - 'A'];
++in;
/* Scan rest of string, stop at end of string or
when the key is full */
for (count = 1; count < 4 && *in != '\0'; ++in) {
/* If non-alpha, ignore the character altogether */
if (isalpha(*in)) {
ch = tolower(*in);
/* Fold together adjacent letters sharing the same code */
if (last != code[ch - 'a']) {
last = code[ch - 'a'];
/* Ignore code==0 letters except as separators */
if (last != 0)
key[count++] = '0' + last;
}
}
}
return key;
}
#ifdef TESTPROG
/*
* If compiled with -DTESTPROG, main() will print back the key for each
* line from stdin.
*/
#include <stdio.h>
int main()
{
char instring[80];
while (fgets(instring, sizeof instring, stdin) != NULL)
printf("%s\n", soundex(instring));
return 0;
}
#endif TESTPROG
| Download code |