微信删除好友后聊天记录是不能恢复的,一旦你们之间的聊天记录就会永久删除。在实现减少tokens的同时,保证对话的质量方面,我们可以借助机器学习来改进微信中文对话机器人。
首先,通过大数据分析,可以将用户使用历史、行为特征、性格特征等作为feature vector来衡量用户之间的相似度。然后通过K-means等聚类方法将所有用户分割开来并构建cluster tree. 既能够帮助生产减少token数量, 同时也能根据不同cluster 中user 群体特性, 进考察正确回复内容.
其次, 针对chatbot 系统考虑使用seq2seq 技术. seq2seq 是一个encoder-decoder framework (即将input sequence 转化为 output sequence) , 在NLP 领域已广泛应用. 其相关神经网络中LSTM(Long short term memory) neuron cell 由RNN (Recurrent neural network ) 扩展而来 , 具备memory mechanism . LSTM cell 有4 个 gate : input gate , forget gate, output gate and update gate . 这4 个gate 由三部分sigmoid layer with tanh layer 組成 : sigmoid layer for input and forget gates ; tanh layer for update state; sigmoid layer for output gates . 通過調整gate (例如forget or remember certain information ), LSTM neurons can learn long-term dependencies in the data set without being affected by the vanishing gradient problem which is a common issue in traditional RNNs models. Seq2Seq model with enconder/decoder architecture design could help chatbot system to generate more accurate response according to user‘s query inputs while reducing tokens quantity at same time .
此外,如情愫理解(Emotional Understanding), 多任务学习(Multi Task Learning) 也是chatbot system quality improvement important part of it. Emotional Understanding include sentiment analysis and emotion detection etc., which allow chatbot system not only understand user's words but also their feelings behind those words. Multi Task Learning allows machine learning model learn multiple tasks simultaneously instead of train one task at one time that could improve accuracy level significantly due to sharing weights between different tasks when training them together as well as reduce overfitting risk associated with single task learning process .