| Phonetic Parser |
|
The phonetic converter works off the metaphone system for which the verbal rules
follow at the end of this page. The reason I have writen a phonetic parser is to prepare some functionality for searches. When searching for words its pretty often that you will find that people will spell what they are searching for incorrectly, especially when typing to spell peoples names. Problem being that when people make mistakes its likely that they have typed how the words sound and the computer tries to match words exactly. The solution is to convert what people type to how it sounds! My implementation is written in java and based solely on the verbal rules described below. There have been many implementations with example source code available around however they seem to be rather varied in accuracy and have custom rules so I have done mine from scratch. A common rule implemented by other implementations but not mine is to stop when four(4) letters have been collected, where as mine will go until the end of the string is reached. Feel free to try out my implementation and report any discrepencies or errors to me. Also contact me if you would like to get a copy of the relevant code on the condition that I am any informed of improvements and/or bugfixes. |
The 16 consonant sounds: |--- ZERO represents "th"
|
B X S K J T F H L M N P R 0 W Y
General Rules:
Double letters are dropped except "CC"
Vowels are dropped unless at the beggining of a word
Exceptions:
Beginning of word: "a", "e", "i", "o", "u" ----> keep it (except "ae")
Beginning of word: "ae-", "gn", "kn-", "pn-", "wr-" ----> drop first letter
Beginning of word: "x" ----> change to "s"
Beginning of word: "wh-" ----> change to "w"
Transformations:
B ----> B unless at the end of word after "m"
C ----> X (sh) if "-cia-" or "-ch-"
S if "-ci-", "-ce-", or "-cy-"
SILENT if "-sci-", "-sce-", or "-scy-"
K otherwise, including in "-sch-"
D ----> J if in "-dge-", "-dgy-", or "-dgi-"
T otherwise
F ----> F
G ----> SILENT if in "-gh-" and not at end or before a vowel
in "-gn" or "-gned"
in "-dge-" etc., as in above rule
J if before "i", or "e", or "y" if not double "gg"
K otherwise
H ----> SILENT if after vowel and no vowel follows
or after "-ch-", "-sh-", "-ph-", "-th-", "-gh-"
H otherwise
J ----> J
K ----> SILENT if after "c"
K otherwise
L ----> L
M ----> M
N ----> N
P ----> F if before "h"
P otherwise
Q ----> K
R ----> R
S ----> X (sh) if before "h" or in "-sia-" or "-sio-"
S otherwise
T ----> X (sh) if "-tia-" or "-tio-"
0 (th) if before "h"
silent if in "-tch-"
T otherwise
V ----> F
W ----> SILENT if not followed by a vowel
W if followed by a vowel
X ----> KS
Y ----> SILENT if not followed by a vowel
Y if followed by a vowel
Z ----> S
|




