How to get particular file word count from AWS S3 storage using lambda?

Question

My scenario, I am trying to get particular AWS S3 stored text file word count and its language detection using AWS lambda python code. Here, below code I am trying. It is providing line count but I don't know how to get word count and language detection. Please provide some idea for get file word count and language detection.

I tried for line count

import boto3

def lambda_handler(event, context):

    # create the s3 resource
    s3 = boto3.resource('s3')

    # get the file object
    obj = s3.Object('bucket name', 'sample.txt')

    # read the file contents in memory
    file_contents = obj.get()["Body"].read()

    # print the occurrences of the new line character to get the number of lines
    # print file_contents.count('\n')
    # TODO implement
    return {
        'Line Count': file_contents.count('\n')
    }

Current Response: { "Line Count": 48, }

Expected Response: { "Line Count": 48, "Word Count": : ?, // Here I want to show word count "Language": ? // Here language name }

You say it's not working, could you perhaps give more details about what's not working? Could you also provide a sample file and what you expect to get back from that file? — Nick Chapman, Jan 09 '19 at 17:03
Hi @NickChapman I updated my question could you please check it? — sai, Jan 09 '19 at 17:10

score 0 · Answer 1 · answered Jan 09 '19 at 20:44

To get the number of words you can try any of the things listed here: How to count the number of words in a sentence, ignoring numbers, punctuation and whitespace?

To detect the language you can try one of the things listed here: NLTK and language detection

Unfortunately, your question is rather broad. Additionally, the task of detecting a text's language is rather difficult to get right. Getting the word count is easy but depends a lot on what you are going to define a word as.

How to get particular file word count from AWS S3 storage using lambda?

1 Answers1