From ptownson@massis.lcs.mit.edu  Wed Mar 19 09:10:57 1997
Return-Path: <ptownson@massis.lcs.mit.edu>
Received: by massis.lcs.mit.edu (8.7.4/NSCS-1.0S) 
	id JAA04371; Wed, 19 Mar 1997 09:10:57 -0500 (EST)
Date: Wed, 19 Mar 1997 09:10:57 -0500 (EST)
From: ptownson@massis.lcs.mit.edu (TELECOM Digest Editor)
Message-Id: <199703191410.JAA04371@massis.lcs.mit.edu>
To: ptownson
Subject: Word/Number Program


Here is an interesting file which came my way that I thought
the rest of you would enjoy as well. Play with it awhile and
see what interesting combinations you can create.


PAT

  Subject: Number/Word Program
  Date: Tue, 18 Mar 97 12:52:09 PDT
  From: rweller@h-e.com
  Organization: Hammett & Edison, Inc.

Pat, 

I am re-sending separately the number/word program in binhex form.  In case
you have problems with it, I am sending the source code in two parts as plain
text.  The first part appears below. The second part is appended to it
in this version going out to the Digest.

Regards,

Bob 

>>>>>>>>>
/*-------------------------------------------------------------------
   number2word.c

   Program to find word equivalents to telephone numbers, or convert letter
   strings to numbers.  In number-to-word mode, a search is made for valid
   words based on a list read from a dictionary file.  Optionally, the word
   search may be turned off to print out all possible letter iterations.

   Command-line options:

     -r
         Reverse mode, convert letter strings to numbers (numbers to
         words is the default mode).

     -a
         In number to word mode, suppress the dictionary check for valid
         words and print out all possible letter combinations.

     -d file
         Use the named file as the word dictionary, it must contain a list
         of words one per line in alphabetical order.  If not specified,
         "words" in the current directory is the default.

     -s
         Rudimentary pluralization of words, if the last letter of a
         word being checked is 's' and there is no match for the whole
         word, drop the 's' and try again.  Normally the word must match
         a dictionary word exactly.

     -q
         Include 'q' and 'z' in the conversions, with 'q' on the 7 key
         and 'z' on the 9.  Normally these letters are not used when
         converting numbers to words, and are illegal when converting
         strings to numbers.

     -0
         0-o and 1-i mode.  With this option, 'o' is removed from its
         normal position on the 6 and made equivalent to 0, similarily
         'i' is moved from the 4 to the 1.  Without this, 0 and 1 are
         illegal when converting numbers to words, and are not used when
         converting strings to numbers.

   Following any options on the command line are numbers or strings to be
   converted.  Number arguments may contain letters in number to word mode,
   and strings may contain numbers in reverse mode.  In the first case,
   any letters will first be converted back to numbers, so all iterations
   of other letters on the same number key will be checked.  In the second
   case, the numbers are simply passed through unchanged.  In either mode,
   the arguments may contain separator characters, anything that is not
   [0-9A-Za-z] is a separator.  Separators are passed through unchanged,
   but in either mode a separator cannot be the first or last character of
   an argument, and no two separators may be sequential within the argument.

   As indicated above, arguments for number to word conversion must not
   contain digits 0 or 1 unless the option "-0" appears, and arguments for
   reverse conversion must not contain 'q' or 'z' unless "-q" appears.  In
   number to word mode, the arguments must not contain more than a maximum
   number of digits, not counting separators.  The limit is defined below
   as MAXNUM.  The number of letter iterations to check will increase
   geometrically with the number of digits, so setting MAXNUM very large
   might make this take a very long time.  10 is the recommended value.

   In number to word mode, separators are viewed as breaking the number into
   sub-strings.  During the word search, two approaches are used.  First a
   search is made for a word match to the entire number, ignoring separators,
   then a search is made for words that match each sub-string.  For example,
   "223-6636" will yield both "abdomen" and "bad-omen".

   "Release notes":

   The word search is only as good as the dictionary file, a very basic one
   with words up to 10 characters ought to arrive along with this source file,
   but it can undoubtedly be improved.  This is left as an exercise for the
   interested student.

   This was written for a Unix environment, and tested under a couple of
   flavors (ULTRIX 3.1, Linux 2.0.28), but that's as far as it goes.  I did
   this as a favor for a friend and not primarily for distribution, and I
   don't have the time to deal with support for other platforms.  You're
   welcome to do whatever you want with it to make it work on your favorite
   OS/machine, but frankly I don't want to hear about it, hence the lack of
   anything here like my name/e-mail address.  Good luck! */

/*--------------------------------------------------------------------
   Header files. */

#include <stdio.h>
#include <string.h>
#include <ctype.h>

/*--------------------------------------------------------------------
   Constants and parameters. */

#define MAXNUM 10   /* Max number of digits in a number for word conversion. */

#define DICTNAME "words"   /* Default dictionary file name. */

#define WORDALLOC 2500   /* Allocation increment for word storage. */


/*--------------------------------------------------------------------
   Function prototypes. */

void loadwords(FILE *, int *);
void tonumbers(char *);
void toletters(char *, int);
void findwords(char *);
int checkword(char *, int);

/*-------------------------------------------------------------------
   Globals. */

struct {             /* Dictionary word lists, one for each word length. */
  int nword[26];     /* Number of words for each first letter. */
  char *index[26];   /* Pointers to first word for each letter. */
  char *block;       /* Word storage. */
} Words[MAXNUM];

int Mode = 1;   /* True for numbers to words, false for letters to numbers. */

int Nosearch = 0;   /* True to suppress word search and show all iterations. */

int Plurals = 0;   /* True to allow 's' at the end of any matching word. */

int Qmode = 0;   /* True to include 'q'<->7 and 'z'<->9. */

int Zeromode = 0;   /* True to make 'o'<->0 and 'i'<->1. */

int Nsub;             /* Substring info - number, length, and start of all */
int Slen[MAXNUM];     /*  substrings in a number being converted to words, */
char *Subs[MAXNUM];   /*  global to avoid duplication in recursion. */

/*--------------------------------------------------------------------
   Main routine. */

main(
     int argc,
     char **argv)

{
  FILE *dict;
  int n, i, s, iopt, istart, subl, lenflag[MAXNUM];
  char *p, *dname, str[(2 * MAXNUM) + 1];

/* Set the default dictionary file name. */

  dname = DICTNAME;

/* Begin parsing command line, look for options.  This supports "-a -b -c"
   or "-abc" forms, or any mixture. */

  iopt = 1;
  while (iopt < argc) {
    p = argv[iopt];
    if (*p != '-') {
      break;
    }
    for (p++; *p; p++) {
      switch (*p) {
      case 'r':   /* Switch to letter to number mode. */
	Mode = 0;
	break;
      case 'a':   /* Turn off word-search mode. */
	Nosearch = 1;
	break;
      case 'd':   /* Next arg has name for dictionary file. */
	if (++iopt >= argc) {
	  break;
	}
	dname = argv[iopt];
	break;
      case 's':   /* Set flag to check for pluralized words. */
	Plurals = 1;
	break;
      case 'q':   /* Include 'q' and 'z'. */
	Qmode = 1;
	break;
      case '0':   /* Make 0 be 'o' and 1 be 'i'. */
	Zeromode = 1;
	break;
      default:   /* What? */
	iopt = argc;
	break;
      }
      if (iopt >= argc) {
	break;
      }
    }
    iopt++;
  }

/* Done with options, there must be some more arguments to process.  Also
   iopt >= argc if an error occurred above. */

  if (iopt >= argc) {
    fprintf(stderr,
	    "usage: %s [-radsq0] arg ...\n",
	    argv[0]);
    fputs("       -r  convert strings to numbers\n", stderr);
    fputs("       -a  print all letter iterations\n", stderr);
    fputs("       -d  use next arg as name of dictionary file\n", stderr);
    fputs("       -s  append 's' to every word in dictionary\n", stderr);
    fputs("       -q  include 'q' <-> 7 and 'z' <-> 9\n", stderr);
    fputs("       -0  make 'o' <-> 0 and 'i' <-> 1\n", stderr);
    exit(1);
  }

/* Scan the arguments and check for errors.  In number to word mode, check the
   number of digits against the max, check for 0 and 1 unless Zeromode is true,
   also track the possible sub-string lengths so that words of all required
   lengths can be loaded into the dictionary later.  In letter to number mode,
   check for 'q' and 'z' unless Qmode is true.  In either mode, check for a
   separator as the first/last character or two separators in a row.  Also,
   all the arguments get converted to lower-case in place. */

  istart = iopt;
  for (n = 0; n < MAXNUM; n++) {
    lenflag[n] = 0;
  }
  for (; iopt < argc; iopt++) {
    n = 0;
    subl = 0;
    s = -1;
    for (p = argv[iopt]; *p; p++) {
      *p = tolower(*p);
      if (isalnum(*p)) {
	s = 0;
	if (Mode) {
	  if (!Zeromode) {
	    if ((*p == '0') || (*p == '1')) {
	      fprintf(stderr, "%s: in '%s': '0' or '1' not allowed\n",
		      argv[0], argv[iopt]);
	      exit(1);
	    }
	  }
	  n++;
	  subl++;
	} else {
	  if (!Qmode) {
	    if ((*p == 'q') || (*p == 'z')) {
	      fprintf(stderr, "%s: in '%s': 'q' or 'z' not allowed\n",
		      argv[0], argv[iopt]);
	      exit(1);
	    }
	  }
	}
      } else {
	if (s) {
	  if (s < 0) {
	    fprintf(stderr, "%s: in '%s': illegal leading separator\n",
		    argv[0], argv[iopt]);
	  } else {
	    fprintf(stderr, "%s: in '%s': illegal repeated separator\n",
		    argv[0], argv[iopt]);
	  }
	  exit(1);
	}
	s = 1;
	if (Mode) {
	  lenflag[subl - 1] = 1;
	  if (Plurals && (subl > 1)) {
	    lenflag[subl - 2] = 1;
	  }
	  subl = 0;
	}
      }

    }
    if (s) {
      fprintf(stderr, "%s: in '%s': illegal trailing separator\n",
	      argv[0], argv[iopt]);
      exit(1);
    }
    if (Mode) {
      if ((n < 1) || (n > MAXNUM)) {
	fprintf(stderr, "%s: in '%s': illegal digit count\n",
		argv[0], argv[iopt]);
	exit(1);
      }
      lenflag[subl - 1] = 1;
      if (Plurals && (subl > 1)) {
	lenflag[subl - 2] = 1;
      }
      lenflag[n - 1] = 1;
      if (Plurals && (n > 1)) {
	lenflag[n - 2] = 1;
      }
    }
  }

/* Load the dictionary file, if needed.  Only words of lengths equal to
   possible word lengths in the actual arguments are loaded. */

  if (Mode && !Nosearch) {
    if ((dict = fopen(dname, "r")) == NULL) {
      fprintf(stderr, "%s: unable to open dictionary file '%s'\n",
	      argv[0], dname);
      exit(1);
    }
    loadwords(dict, lenflag);
    fclose(dict);
  }

/* Main loop over arguments.  First make a copy of the argument.  In number to
   word mode, run the arg through tonumbers() in case it contains any letters,
   then print it out.  Then build the substring locations in globals so this
   doesn't need to be done repeatedly on every letter iteration (the separators
   are not going to move around or change).  Finally call toletters(), which
   recursively handles number to letter conversions and calls the routine
   findwords() at the bottom to search for word matches.  In letter to number
   mode, simply echo the argument, call tonumbers(), and print the result. */

  for (iopt = istart; iopt < argc; iopt++) {
    strcpy(str, argv[iopt]);
    if (Mode) {
      tonumbers(str);
      putchar('\n');
      puts(str);
      Nsub = 0;
      Slen[0] = 0;
      Subs[0] = str;
      for (p = str; *p; p++) {
	if (isdigit(*p)) {
	  Slen[Nsub]++;
	} else {
	  Slen[++Nsub] = 0;
	  Subs[Nsub] = p + 1;
	}
      }
      Nsub++;
      toletters(str, 0);
    } else {
      putchar('\n');
      puts(str);
      tonumbers(str);
      puts(str);
    }
  }
  putchar('\n');
  exit(0);
}


/*---------------------------------------------------------------------
   Routine reads in the word dictionary. */

void loadwords(
	       FILE *in,      /* Input stream. */
	       int *lenflg)   /* Flags indicate which word lengths to load. */

{
  int i, n, ilet[MAXNUM], count[MAXNUM], max[MAXNUM];
  char *p, word[50], letter[MAXNUM], *nextword[MAXNUM];

/* Initialize, zero out all the word lists. */

  for (n = 0; n < MAXNUM; n++) {
    letter[n] = 'a' - (char)1;
    ilet[n] = -1;
    count[n] = 0;

    max[n] = 0;
    Words[n].block = NULL;
    for (i = 0; i < 26; i++) {
      Words[n].nword[i] = 0;
      Words[n].index[i] = NULL;
    }
  }

/* Read and store words.  A word with any non-letter in it is ignored, as are
   words more than MAXNUM letters long.  All words are converted to lower-
   case also.  This assumes the input list is already in alphabetical order.
   The words are stored without null termination, since the length of all the
   words in one structure is the same.  Allocation is in chunks of WORDALLOC
   words for all lengths. */

  while (fgets(word, 50, in)) {
    n = 0;
    for (p = word; *p; p++) {
      if (*p == '\n') {
	*p = '\0';
	break;
      }
      if (!isalpha(*p)) {
	n = 0;
	break;
      }
      *p = tolower(*p);
      n++;
    }
    if ((n < 1) || (n > MAXNUM)) {
      continue;
    }
    if (!lenflg[--n]) {
      continue;
    }
    if (max[n] == 0) {
      Words[n].block = (char *)malloc(WORDALLOC * (n + 1));
      nextword[n] = Words[n].block;
      max[n] = WORDALLOC;
    } else {
      if (count[n] == max[n]) {
	Words[n].block = (char *)realloc(Words[n].block,
					 (max[n] + WORDALLOC) * (n + 1));
	nextword[n] = Words[n].block + (max[n] * (n + 1));
	max[n] += WORDALLOC;
      }
    }
    if (*word != letter[n]) {
      if (*word < letter[n]) {
	fputs("**dictionary file is not sorted, exiting\n", stderr);
	exit(1);
      }
      ilet[n] += (int)(*word - letter[n]);
      letter[n] = *word;
    }
    strncpy(nextword[n], word, (n + 1));
    Words[n].nword[ilet[n]]++;
    count[n]++;
    nextword[n] += n + 1;
  }

/* Compute all the word index pointers in the structures, these couldn't be
   done on the fly because they all break if realloc() moves the block. */

  for (n = 0; n < MAXNUM; n++) {
    if (count[n] == 0) {
      continue;
    }
    p = Words[n].block;
    for (i = 0; i < 26; i++) {
      if (Words[n].nword[i] == 0) {
	continue;
      }
      Words[n].index[i] = p;
      p += Words[n].nword[i] * (n + 1);
    }
  }
}


/*------------------------------------------------------------------------
   Routine converts letters to numbers in string. */

void tonumbers(
	       char *str)   /* String to convert. */

{
  char *p;

/* Do it. */

  for (p = str; *p; p++) {
    switch (*p) {
    case 'a':
    case 'b':
    case 'c':
      *p = '2';
      break;
    case 'd':
    case 'e':
    case 'f':
      *p = '3';
      break;
    case 'g':
    case 'h':
      *p = '4';
      break;
    case 'i':
      if (Zeromode) {
	*p = '1';
      } else {
	*p = '4';
      }
      break;
    case 'j':
    case 'k':
    case 'l':
      *p = '5';
      break;
    case 'm':
    case 'n':
      *p = '6';
      break;
    case 'o':
      if (Zeromode) {
	*p = '0';
      } else {
	*p = '6';
      }
      break;
    case 'p':
    case 'q':
    case 'r':
    case 's':
      *p = '7';
      break;
    case 't':
    case 'u':
    case 'v':
      *p = '8';
      break;
    case 'w':
    case 'x':
    case 'y':
    case 'z':
      *p = '9';
      break;
    }
  }
}


/*---------------------------------------------------------------------
   Routine to convert numbers to letters.  This is recursive, the main call is
   for the first digit, each recursive call for a subsequent digit.  Digits are
   replaced by letters in place in the passed string.  At the deepest level,
   findwords() is called to check the converted string for words.  Each digit
   is restored for the next recursive pass. */

void toletters(
	       char *str,   /* String being converted. */
	       int p)       /* Digit to work on. */

{
  char *l, savenum;
  int q;

/* Skip past a separator, earlier checks make it safe to assume there are
   never two separators in a row nor a separator at the very end. */

  if (!isalnum(str[p])) {
    p++;
  }

/* Set possible letter string for the digit.  If '0' or '1' appear it is
   save to assume Zeromode is true due to earlier checks. */

  switch (str[p]) {
  case '0':
    l = "o";
    break;
  case '1':
    l = "i";
    break;
  case '2':
    l = "abc";
    break;
  case '3':
    l = "def";
    break;
  case '4':
    if (Zeromode) {
      l = "gh";
    } else {
      l = "ghi";
    }
    break;
  case '5':
    l = "jkl";
    break;
  case '6':
    if (Zeromode) {
      l = "mn";
    } else {
      l = "mno";
    }
    break;
  case '7':
    if (Qmode) {
      l = "pqrs";
    } else {
      l = "prs";
    }
    break;
  case '8':
    l = "tuv";
    break;
  case '9':

    if (Qmode) {
      l = "wxyz";
    } else {
      l = "wxy";
    }
    break;
  }

/* Loop over letters, recurse.  If at the bottom, if Nosearch is true just
   print out the string, otherwise call findwords(), it will print out the
   string if a match is found. */

  q = p + 1;
  savenum = str[p];
  while (*l) {
    str[p] = *l;
    if (str[q]) {
      toletters(str, q);
    } else {
      if (Nosearch) {
	puts(str);
      } else {
	findwords(str);
      }
    }
    l++;
  }
  str[p] = savenum;
}


/*----------------------------------------------------------------------
   Routine to take a number converted to a letter string and search it for
   words that match the dictionary list. */

void findwords(
	       char *str)   /* Converted string to search for words. */

{
  int i, wlen;
  char *p, whole[MAXNUM + 1];

/* Make a copy of the string with separators stripped and check for a whole
   string word match.  Skip this if there is only one substring, in that case
   the substring is the whole string and this is redundant. */

  if (Nsub > 1) {
    wlen = 0;
    for (i = 0; i < Nsub; i++) {
      strncpy((whole + wlen), Subs[i], Slen[i]);
      wlen += Slen[i];
    }
    whole[wlen] = '\0';
    if (checkword(whole, wlen)) {
      puts(whole);
    }
  }

/* Check for substring matches, all must match for success. */

  for (i = 0; i < Nsub; i++) {
    if (!checkword(Subs[i], Slen[i])) {
      return;
    }
  }
  puts(str);
}


/*----------------------------------------------------------------------
   Routine to compare a string of known length to dictionary, return true or
   false for match. */

int checkword(
	      char *w,   /* Word string, is *not* null terminated. */
	      int n)     /* Length. */

{
  int i, j, l;
  char *c;

/* Do it. */

  j = (int)(*w - 'a');
  if ((l = Words[n - 1].nword[j]) > 0) {
    for (i = 0, c = Words[n - 1].index[j]; i < l; i++, c += n) {
      if (strncmp(w, c, n) == 0) {
	return(1);
      }

    }
  }

/* No match, if Plurals flag is true and the last letter is an 's', check
   again for words one letter shorter. */

  if (Plurals && (n > 1) && (w[n - 1] == 's')) {
    n--;
    if ((l = Words[n - 1].nword[j]) > 0) {
      for (i = 0, c = Words[n - 1].index[j]; i < l; i++, c += n) {
	if (strncmp(w, c, n) == 0) {
	  return(1);
	}
      }
    }
  }
  return(0);
}

              ---------   cut here -----------

