Access rows with string in dataframe column, which contain 2 or more spaces between words using Pandas

Question

I am learning Python on, perhaps real case scenarios, and got a task to filter names of companies which contain more than 3 words. It is in the column named "Company Name" and dataframe is called "data". I managed to get them into the list and eventually also into dataframe. However, in dataframe I found rows at place of columns, and columns at rows. Feels like walking around it.

a,b = data.shape
required_data = []

for i in range(a):
    if data["Company Name"][i].count(" ") >= 2:
        required_data.append(data.iloc[i])
    else:
        pass

required_data1 = pd.concat(required_data, axis=1, ignore_index = True)

required_data1

I would go for axis=0 argument, but it returns, sort of, weird list of items from dataframe. Not sure if this is the right approach and so decided to reach out for the help. Many thanks!

Can you add a overview of the data dataframe, as well as the expected output. You can leverage the usage of pandas with `apply()`, `map()` and `str.split()` for your use-case. — kelyen, Jan 05 '22 at 22:56

score 1 · Accepted Answer · answered Jan 05 '22 at 23:02

Use str.split to split company names into words and count the length of the list then select right rows:

data = pd.DataFrame({'Company Name': ['American Telephone and Telegraph', 
                                      'America Online',
                                      'Capsule Computer',
                                      'International Business MachinesHP']})

required_data1 = data[data['Company Name'].str.split(r'\s+').str.len().ge(3)]
print(required_data1)

# Output
                        Company Name
0   American Telephone and Telegraph
3  International Business MachinesHP

score 0 · Answer 2 · answered Jan 05 '22 at 23:23

You can find the answer in here : How do I select rows from a DataFrame based on column values?

In your case, we can use enumerate and .iloc like this:

required_data1 = data["Company Name"].iloc[[i for i,x in enumerate(data["Company Name"]) if x.count(" ")>=1]]

Access rows with string in dataframe column, which contain 2 or more spaces between words using Pandas

2 Answers2