Python DataFrame Row Iteration

Question

Fake Data:

How do I write the following columns in python?

UpperCaseWords: The sum of the upper case words in each row

%of Upper Case Words: Percentage of text that is in all uppercase

Does this answer your question? [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) — Mateo Torres, Nov 13 '20 at 17:34
Please do not post images of code/errors/data. Instead post the code/errors/data as text in a code block. See [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) — Scratte, Dec 01 '20 at 01:32

Sander van den Oord · Answer 1 · 2020-11-13T17:55:17.323

Use .str.split() to split the words of a sentence in a list and then use .str.isupper() to count the words that are completely uppercase:

# example data
df = pd.DataFrame(
    {'A': ['how are you DOING Or', 'WE want NOW ansWers']}
)

# split your string, default split is on a space
# you get a list words
df['split_words'] = df['A'].str.split()

# iterate over list of words and count how many are uppercase
df['count_upper_case_words'] = df['split_words'].apply(
    lambda list_: sum(1 for word in list_ if word.isupper())
)

# count total number of words
df['count_total_words'] = df['split_words'].str.len()

# calculate percentage of uppercase words
df['perc_uper_case'] = df['count_upper_case_words'] / df['count_total_words'] * 100.

Resulting dataframe:

                           split                   count_upper  perc
0   how are you DOING OR  [how, are, you, DOING, Or]    1   5   20.
1   WE want NOW ansWers   [WE, want, NOW, ansWers]      2   4   50.

score 0 · Answer 2 · answered Nov 13 '20 at 17:51

You can use .str.count() to count the occurrences of upper case and total words separately. From there you can use division to calculate the percentage of uppercase words.

df["n_uppercase_words"] = df["A"].str.count(r"\b[A-Z]+\b")
df["n_words"] = df["A"].str.count(r"\b\w+\b")
df["percent_uppercase_words"] = df["n_uppercase_words"] / df["n_words"] * 100

print(df)
                                    A  n_uppercase_words  n_words  percent_uppercase_words
0                    My name is JACOB                  1        4                     25.0
1  Football and BASKETBALL and SOCCER                  2        5                     40.0
2                        North Dakota                  0        2                      0.0
3                        South Dakota                  0        2                      0.0

Regular Expressions:

\b[A-Z]+\b: captures any 1 or more consecutive upper-case letter that has some form of separation on either side
\b[A-Za-z]+\b: Same as above, but also includes lowercase letters.

This solution will ignore numbers or "words" with numbers in them (or any other character that is not a letter a-z).

ssp4all · Answer 3 · 2020-11-13T18:15:00.493

0

very easy to understand

import pandas as pd
data = {'A':['ABC', 'abc BCD']} 
df = pd.DataFrame(data) #feed data to create DataFrame
def count(row):
    return sum(word.isupper() for word in row.split()) #split given sentence and check each word if uppercase or not
def percentage(row):
    return (int(row['uppercase']) / len(row['A'].split())) * 100. #count the number of words and uppercase word to calculate percentage
df['uppercase'] = df['A'].apply(lambda row: count(row))
df['percentage'] = df.apply(lambda row: percentage(row), axis=1)
df #final data frame

OUTPUT:

    A      uppercase    percentage
0   ABC       1         100.0
1   abc BCD   1         50.0

edited Nov 13 '20 at 18:15

answered Nov 13 '20 at 17:55

ssp4all

371
2
11

Hi @ssp4all, the question is counting the percentage of upper case words, so your code needs to be adjusted – Sander van den Oord Nov 13 '20 at 17:59
@sander-van-den-oord thanks for the comment. Ive updated my answer! – ssp4all Nov 13 '20 at 18:04

Python DataFrame Row Iteration

3 Answers3