Basics of Natural language processing & Part of Speech

Natural language processing (NLP) is an artificial intelligence area in which computers intelligently analyze, understand, and interpret from human language. Developers can use natural language processing (NLP) to organize and structure knowledge for tasks such automatic summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, and topic segmentation.

The Part of Speech tagging that the NLTK module can conduct for you is indeed one of the more powerful features. This includes the identification of words in a sentence as nouns, adjectives, verbs, and so on. It also identifies by tense and more, which is even more impressive.

we will see the few functionalities from NLTK pack.
Note: For 1st Time, you need to provide nltk.download(“all”) in your local python idle(spyder,jupyter,etc,.) when you are trying outside of cogxta

step 1: After you ran the remove_stopwords function with input_list, General Stopwords(“i”,”to”,etc,.) are removed from our input_list.
step 2: We will see Greedy Averaged Perceptron tagger to apply as tagger in postag.
step 3: To get postag pattern for your input.
step 4: We will use word tokenizer vs spilt function.
step 5: To find unusual words from our input data.
step 6 & 7: stem & Lemma, Stemming and lemmatization are methods used by search engines and chatbots to analyze the meaning behind a word. Stemming uses the stem of the word, while lemmatization uses the context in which the word is being used.
step 8: finally, we will see wordnet and Synset functionalities.

import nltk
#nltk.download("all") 
from nltk.corpus import stopwords

def remove_stopwords(input_list):
    use_list=[]
    for y in input_list:
        if y.lower() not in stopwords.words("english"):
            use_list.append(y)
    return use_list
input_list="i want to install cogxta application in my mobile".split()
print("step 1:")
print("After removing stopwords:", "\n",remove_stopwords(input_list));print("")

from nltk.tag.perceptron import PerceptronTagger
def postag(txt):
    precptag = PerceptronTagger()
    input_list=txt.split()
    try:
        tagged=nltk.tag._pos_tag(tokens=input_list, tagset=None, tagger=precptag, lang="eng")
    except:
        tagged=nltk.tag._pos_tag(tokens=input_list, tagset=None, tagger=precptag)
    return tagged
print("step 2:")
print("Applied precptag tagger in Postag:", "\n",postag("i am goin to play football"));print("")

def postag_pattern(txt):
    words = nltk.word_tokenize(txt)
    tagged = nltk.pos_tag(words)
    pattern=""
    for tag in tagged:
        pattern=pattern+" "+tag[1]
    pattern=pattern.strip()
    return pattern
print("step 3:")
print("Postag pattern of input:","\n",postag_pattern("i am going to play football"));print("")

from nltk.tokenize import word_tokenize
def word_tokenizer(txt):
    w=word_tokenize(txt)
    return w
print("step 4:")
print("Word Tokenizer:","\n",word_tokenizer("i won"t going to play football"));print("")
print("General split:","\n","i won"t going to play football".split());print("")

from nltk.corpus import words    
def unwords(text):
    reftxt=words.words()
    txt=[]    
    txt = text.strip( ).split(" ")
    #return txt
    text_vocab = set(w.lower() for w in txt if w.isalpha())
    english_vocab = set(w.lower() for w in reftxt)
    unusual = text_vocab.difference(english_vocab)
    return sorted(unusual)

t="i want to install jsjglas application in my mobile"
print("step 5:")
print("Unusual word from our input:"," ",unwords(t));print("")

from nltk.stem import PorterStemmer
def stemming(w):
    ps = PorterStemmer()
    p = ps.stem(w)
    return str(p)
txt = "carring"
print("step 6:")
print("Applied stemming: ",stemming(txt));print("")

from nltk.stem import WordNetLemmatizer
def lemmatizing(txt):
    wnl = WordNetLemmatizer()
    w = wnl.lemmatize(txt)
    return w
print("step 7:")
print("Applied Lemma: ",lemmatizing("carring"));print("")
print("2nd example for Applied Lemma: ",lemmatizing("cacti"));print("")

from nltk.corpus import wordnet
syn = wordnet.synsets("Good")[0]
print("step 8:")
print ("Synset name :  ", syn.name());print("")
print ("Synset meaning : ", syn.definition());print("")
print ("Synset example : ", syn.examples())

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

eleven + 14 =