Wednesday, December 10, 2014

Competency 8.2

Competency 8.2: Build and evaluate models using alternative feature spaces.

I used the different feature spaces that I saved in the previous exercise for building models. My data set was very small and I intended to use it just for testing. I found significant improvement in metrics while comparing the models of POS features Vs Unigrams and bigrams. I could see from my data that the n-grams were most predictive of the categories.



I couldn't find significant improvements in model metrics for many basic features. I used Naive Bayes as the classification algorithm. I also tried other algorithms, but there was not a big difference in the metrics' values. Few feature spaces I tried along with the metrics for their models are below:
                             
Feature Space
Accuracy
Kappa
POS grams
42%
0.12
12 grams_count
58%
0.36
1 grams_pairs
61%
0.41
12 grams_length
61%
0.41
12 POS grams
65%
0.47
12 grams_no stop
69%
0.52
12 grams
73%
0.59
123 grams
73%
0.59

To test with a real data set, I tried the hands on activity of text feature extraction given in Prosolo using sentiment_sentences data set. I extracted different feature spaces from the basic feature set and used logistic regression. There was significant improvement while expanding the feature set.


No comments:

Post a Comment

All materials are based on the EdX course - Data, Analytics and Learning
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.