0

I am learning Python on, perhaps real case scenarios, and got a task to filter names of companies which contain more than 3 words. It is in the column named "Company Name" and dataframe is called "data". I managed to get them into the list and eventually also into dataframe. However, in dataframe I found rows at place of columns, and columns at rows. Feels like walking around it.

a,b = data.shape
required_data = []

for i in range(a):
    if data["Company Name"][i].count(" ") >= 2:
        required_data.append(data.iloc[i])
    else:
        pass

required_data1 = pd.concat(required_data, axis=1, ignore_index = True)

required_data1

I would go for axis=0 argument, but it returns, sort of, weird list of items from dataframe. Not sure if this is the right approach and so decided to reach out for the help. Many thanks!

  • Can you add a overview of the data dataframe, as well as the expected output. You can leverage the usage of pandas with `apply()`, `map()` and `str.split()` for your use-case. – kelyen Jan 05 '22 at 22:56

2 Answers2

1

Use str.split to split company names into words and count the length of the list then select right rows:

data = pd.DataFrame({'Company Name': ['American Telephone and Telegraph', 
                                      'America Online',
                                      'Capsule Computer',
                                      'International Business MachinesHP']})

required_data1 = data[data['Company Name'].str.split(r'\s+').str.len().ge(3)]
print(required_data1)

# Output
                        Company Name
0   American Telephone and Telegraph
3  International Business MachinesHP
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

You can find the answer in here : How do I select rows from a DataFrame based on column values?

In your case, we can use enumerate and .iloc like this:

required_data1 = data["Company Name"].iloc[[i for i,x in enumerate(data["Company Name"]) if x.count(" ")>=1]]
Wahyu Hadinoto
  • 198
  • 1
  • 10