trindex: A trigram search library for Go back to frontpage

I present my new library called trindex. trindex is an open-source fuzzy search library written in and for Go. You can find its source on GitHub.

It uses trigrams to allow fuzzy searches for terms and implements its own database. Even given its alpha stage, it’s already usable. One common use case for trindex would be to index a list of terms from your database (for example travel destinations like cities) and use trindex for searching. Misspellings in city names would usually still lead to the correct city then.

I put up a demo page using trindex with all German Wikidata lemmas (4064962 in total) being indexed and searchable. The trindex database is about 850 MiB in size. You can download the extracted wikidata titles (in CSV-format) from my server licensed under the same license Wikidata provides their data. The search performance is yet not as good as it could be; this is caused on the one hand by the very small virtual server (low RAM, low CPU power) driving this webpage and on the other hand by the TODOs in trindex which will influence the performance positively. I will work on this over time.

trindex got a fairly simple API for insertion and querying. Removal is not yet supported (still on my TODO).

Simple example of the API usage:

idx := trindex.NewIndex("trindex.db")
defer idx.Close()

dataset := []string{
    "Mallorca", "Ibiza", "Menorca", "Pityusen", "Formentera", 
    "Berlin", "New York", "Yorkshire",
}

for _, data := range dataset {
    id := idx.Insert(data)
    // Use ID to connect the term with the associated dataset;
    // for example save the ID in your SQL database about travel destinations
}

results := idx.Query("malorka", 2, 0.3)

// Returns a sorted list of 2 results including the ID and
// a confidence number ("Similarity"; 1 = best match) >= 0.3

The resultset (results) is a list of two results and would look like this:

ID Similarity
1 0.7012895
3 0.4159252

with 1 being “Mallorca” and 2 being “Menorca”. The result set is sorted by similarity (i. e. confidence, max 1.0).

The Wikidata example is available in the repository (build & query) as well and looks like this:

Build the index

Query the index

I hope you’ll find trindex useful. As always, any feedback and/or pull request is highly appreciated.


New comment

Comments are moderated and therefore are not published instantly.





Comments

No comments yet. Be the first! :-)