Thank you everybody. I will be working through these suggestions and let you know what I come up with.
Ken
I remember this joy of searching names in a system that had 2+ million customers and names were all varchar() instead of a key to a secondary table. My indexes sure took a beating when I got another "Williams", the number one last name in the system, and it had to tear a page to make a new page in this area.
I found that making a table called NAMES fixed the search time I was experiencing. Two text boxes had input for whatever they keyed. I added the % for wildcard after any text in each box and one of the keypress events was the trigger to run it.
Select <field_list> from customer where lNameID in ( select nameID from names where Name like @Lname) and fNameID in ( select nameID from names na where na.Name like @Fname)
That has been 10-13 years ago.
On Wed, Apr 12, 2017 at 9:55 AM, Ken Dibble krdibble@stny.rr.com wrote:
Hi folks,
I've been thinking of how I can improve the ability of my users to find people's names in a system that has over 30,000 people in it.
I've looked at soundex, and I've considered munging names to remove spaces, apostrophes, hyphens, etc. The thing about those approaches is that in order to be efficient, they require pre-processing all of the names in the system and storing the results, which can then be queried to find matches.
Unfortunately, that would require modifications to the database, which I try to avoid due to the downtime they require.
I'm looking for suggestions on how to produce results that include close matches on last names that doesn't require pre-processing.
I've played with various schemes to assign "weights" to matches based on the number of matching letters, but they all end up being very slooooow and also producing too many false positives.
I suppose there are no easy answers, but if anyone has an algorithm for this kind of thing that they would be willing to share, I'd be grateful.
Thanks.
Ken Dibble www.stic-cil.org
[excessive quoting removed by server]
Unfortunately, that would require modifications to the database, which I try to avoid due to the downtime they require.
Why would that be an issue of consequence? You add some columns to a table. The rest of the software canignore them. (Unless you use select * or other black arts, said rest might never see the new columns.)
Yeah, there are black arts involved. My framework has some generic code that expects to be able to process every field in certain types of tables. If there is not corresponding code for each field in certain subclasses, errors are thrown. Probably a poor design choice in retrospect, but the choice was made in 2004 and rectifying it would require a huge amount of work.
Thus when I make data changes involving certain tables, I have to lock everybody out not only for the time it takes to make those changes, but to update everybody's copy of the software. There are over 100 users at this site, many of whom use the software almost constantly, and there are other sites that also use the software.
One of those tables would be the table that contains the names of people.
I can work around that by storing the pre-processed data in a separate table with a one-to-one relationship to the names table (assuming that I don't need backlinks in the names table), but that seems inefficient to me. My principle for organizing a table of people is that it should contain every required aspect of the person's identity of which the person can have only one. A soundex value or Levenshtein weight for the person's name would qualify as such.
No, I don't have a loader program. I would have to introduce internet download capabilities in order to provide that, and that's a headache I don't need.
I'm looking for suggestions on how to produce results that include close matches on last names that doesn't require pre-processing.
I can not see that the preprocessing would be very involved.
I agree, it would not. It's the necessity of providing storage for the results that causes the problem.
Well, there's a separate problem--the cost of having to appy one or two UDFs that would have to run on every name-search query. I was hoping, though, to get some suggestions for UDFs that I could at least test and see if they aren't impossibly slow.
I suppose there are no easy answers, but if anyone has an algorithm for this kind of thing that they would be willing to share, I'd be grateful.
There are not, because different languages assign differentvalues to the Roman alphabet characters. You are going to have decide on language trade-offs.
I'm liking the Levenshtein stuff, and I could use it in my utility for detecting and removing duplicate records. That wouldn't involve pre-processing. I think it could really speed that up.
Thanks.
Ken Dibble www.stic-cil.org
On Thu, Apr 13, 2017 at 1:15 PM, Ken Dibble krdibble@stny.rr.com wrote:
Well, there's a separate problem--the cost of having to appy one or two UDFs that would have to run on every name-search query. I was hoping, though, to get some suggestions for UDFs that I could at least test and see if they aren't impossibly slow.
You might punish the occasional transposer, rather than everyone, by adding a "Find More" button to run the more intensive search. That way, the operators get a fast result when they type correctly, encouraging a positive feedback cycle.