1

Fake Data:

enter image description here

How do I write the following columns in python?

UpperCaseWords: The sum of the upper case words in each row

%of Upper Case Words: Percentage of text that is in all uppercase

Jacob3454
  • 175
  • 1
  • 9
  • Does this answer your question? [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) – Mateo Torres Nov 13 '20 at 17:34
  • So you want to know how to calculate these values? – xjcl Nov 13 '20 at 17:36
  • Please do not post images of code/errors/data. Instead post the code/errors/data as text in a code block. See [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) – Scratte Dec 01 '20 at 01:32

3 Answers3

0

Use .str.split() to split the words of a sentence in a list and then use .str.isupper() to count the words that are completely uppercase:

# example data
df = pd.DataFrame(
    {'A': ['how are you DOING Or', 'WE want NOW ansWers']}
)

# split your string, default split is on a space
# you get a list words
df['split_words'] = df['A'].str.split()

# iterate over list of words and count how many are uppercase
df['count_upper_case_words'] = df['split_words'].apply(
    lambda list_: sum(1 for word in list_ if word.isupper())
)

# count total number of words
df['count_total_words'] = df['split_words'].str.len()

# calculate percentage of uppercase words
df['perc_uper_case'] = df['count_upper_case_words'] / df['count_total_words'] * 100.

Resulting dataframe:

                           split                   count_upper  perc
0   how are you DOING OR  [how, are, you, DOING, Or]    1   5   20.
1   WE want NOW ansWers   [WE, want, NOW, ansWers]      2   4   50.
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
0

You can use .str.count() to count the occurrences of upper case and total words separately. From there you can use division to calculate the percentage of uppercase words.

df["n_uppercase_words"] = df["A"].str.count(r"\b[A-Z]+\b")
df["n_words"] = df["A"].str.count(r"\b\w+\b")
df["percent_uppercase_words"] = df["n_uppercase_words"] / df["n_words"] * 100

print(df)
                                    A  n_uppercase_words  n_words  percent_uppercase_words
0                    My name is JACOB                  1        4                     25.0
1  Football and BASKETBALL and SOCCER                  2        5                     40.0
2                        North Dakota                  0        2                      0.0
3                        South Dakota                  0        2                      0.0

Regular Expressions:

  • \b[A-Z]+\b: captures any 1 or more consecutive upper-case letter that has some form of separation on either side
  • \b[A-Za-z]+\b: Same as above, but also includes lowercase letters.

This solution will ignore numbers or "words" with numbers in them (or any other character that is not a letter a-z).

Cameron Riddell
  • 10,942
  • 9
  • 19
0

very easy to understand

import pandas as pd
data = {'A':['ABC', 'abc BCD']} 
df = pd.DataFrame(data) #feed data to create DataFrame
def count(row):
    return sum(word.isupper() for word in row.split()) #split given sentence and check each word if uppercase or not
def percentage(row):
    return (int(row['uppercase']) / len(row['A'].split())) * 100. #count the number of words and uppercase word to calculate percentage
df['uppercase'] = df['A'].apply(lambda row: count(row))
df['percentage'] = df.apply(lambda row: percentage(row), axis=1)
df #final data frame

OUTPUT:

    A      uppercase    percentage
0   ABC       1         100.0
1   abc BCD   1         50.0
ssp4all
  • 371
  • 2
  • 11