Sale!

$6.00

Hate Speech Classification of Codeswitched Data: Leveraging Psycho-social Features to classify Hate Speech: Case of Kenyan Tweets during 2017 Elections, Hannah O Schwefel, 9781952751899

Description

Identifying short text messages containing hate speech from the gigantic content generated by users on social media is a challenging classification task. Social media data presents unprecedented challenges to conventional natural language processing techniques regarding extracting high-quality features from the noisy, highly dimensional, codeswitched, and big unstructured data. Besides, a systematic review of previous studies indicated lack of publicly available annotated datasets for comparative studies, little evidence of theoretical underpinning for the annotation schemes used, and hardly any study on codeswitched data. To address these gaps, this book explores a data-driven approach in identifying highly qualitative and discriminative features in hate text messages from social media. The goal was to subsequently use these features to train a better performing machine classification model in effectively capturing subtle hate speech text messages from social media. Approximately 400k messages were crawled from social media for a period of one year during the 2017 general election period in Kenya using a combination of problematic hashtags, ethnic slurs, hate patterns, and messages from pro-hate user accounts. A random sample of 50k messages was manually labeled into three classes, i.e., Hate Speech, Offensive, or Neither, by a team of 27 human annotators. Subsequently, this dataset was further reduced by extracting a psychosocial feature subset (PDC) informed by the conceptual framework using a hierarchical probability modeling technique. To evaluate and select the best model, a grid search was performed over all the combination of features using a 5- fold cross-validation, with a tenth of the data reserved for evaluation as well as to avoid over-fitting the model. Based on the results of the experiments, the novel psychosocial feature set (PDC) was effective in identifying hate speech and outperformed the conventional features in training the best classifier, i.e., using the linear SVM algorithm, with accuracies of 82.8%. The Passion (P) and Distance (D) components proved the most salient with accuracies of 74.3% and 74.2%, respectively. Besides, the psychosocial feature framework generalized better in handling other types of hate speech.

Additional information

ISBN

Page Number

Author

Publisher