Optimized Word Vector data
From version 2.9.0
OSCOVA provides new extension method to save optimized word vector data after first training. This helps developers reduce the size of Word vector data file that needs to be deployed along with their application.
Optimizing Word data
An optimized version of the loaded Word Vector data is generated after the first training. During the training process OSCOVA looks up for words, entities and their relative word representation within loaded word vector data file. After the lookup has completed OSCOVA automatically stores relatives of those words found within the initially loaded word vector data.
Note
If you are using Unified Language Model file, you are not required to save or load optimized version of Word vectors from separate .vec
file.
Loading source word vectors
In order to fully optimized loaded word vectors it is firstly essential that the developer first loads the source word vector data and initiates the training process. This is imperative as only during training OSCOVA will extract domain specific word vector data from the source word vector data file and store only the relatives of required words.
var bot = new OscovaBot();
bot.Language.WordVectors.Load(@"D:\DownloadWordVectorData\wiki-en.vec", VectorDataFormat.Text);
//Do other configuration and intent additions
bot.Trainer.StartTraining();
Once the training has finished Oscova will disregard unrelated words and only store and work with a domain-optimized word vector data.
Saving domain-optimized word vector data
From version 2.9.0
and above OSCOVA provides extensions methods to save this optimized version of the related word vector data. To access these methods please ensure that you have import the Syn.Bot.Oscova.Extensions
and Syn.Oryzer.TextRepresentation
namespace. Once you have imported the aforementioned namespaces you may save an optimized version of word vector as shown below.
var optimizedWordVector = Saga.Language.WordVectors.Optimize(bot);
optimizedWordVector.Save(@"D:\domain-optimized.vec", VectorDataFormat.Text);
var bot = new OscovaBot();
//During deployment load optimized word vector data.
bot.Language.WordVectors.Load(@"D:\domain-optimized.vec", VectorDataFormat.Text);
//Do other configuration and intent additions
bot.Trainer.StartTraining();
Once you have saved an optimized version of word vector data you can use this saved data during deployment.
Note
If absolutely any change is made in any expression, entity value, dialog or intent structure developers are highly recommended to regenerate optimized word vector data.
Tip
Due to OSCOVA's modular architecture developers can load both word vectors and lexical databases side by side without any conflict.