2

Hope everyone is doing well. I am using pandas and numpy and I would like to extract column values based on the first 3 letters (ap.) from a Dataframe. Below is an example of my Dataframe.

Name Number
Orange 2
APple 6
Ap.ricot 1
AP.19 1
Juap.rte 3

I've tried df[df['Name'].str.lower().str.contains('ap.', na=False)].Name.unique() but it does not fully do the trick.

Output:

['AP.19','Ap.ricot']

The output should ideally be a list that I can then save onto a variable. Additionally, the 3 letters need to be at the start and in this order.

I am very new to Python so please explain as clearly as possible. Thank you.

Ammar Kamran
  • 139
  • 1
  • 1
  • 8

3 Answers3

2

Given the comments in the post, I believe you can get it done with:

ap = [x for x in df['Name'] if x.lower().startswith('ap.')]

And if you wish to not have duplicates, then you can use:

ap = [x for x in df['Name'].unique() if x.lower().startswith('ap.')]
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

This may help you:

final = []


df['NameCopy'] = df['Name'].str.lower()


for index,row in df.iterrows():
   if row['NameCopy'].find('ap.') != -1:
      final += [row['Name']]
   else:
      pass

print(final)
Prateek Jain
  • 231
  • 1
  • 11
  • Thank you! This also works. I will accept and approve the answer as soon as StackOverflow allows me. – Ammar Kamran Dec 22 '20 at 18:04
  • 2
    This is very inefficient as it is looping over a dataframe and should be discouraged. – Celius Stingher Dec 22 '20 at 18:06
  • 1
    Yes agree to @CeliusStingher: One should avoid iterating. https://stackoverflow.com/a/55557758/6660373 btw merry Christmas to all of you in advance :) – Pygirl Dec 22 '20 at 18:07
  • @CeliusStingher: Good point, but such loops are better where there are more conditions you need to loop through within the same iteration. Which is why I prefer using these in my datasets. – Prateek Jain Dec 22 '20 at 18:08
  • You should avoid that. Visit the link that I have mentioned. It will help you in future :) – Pygirl Dec 22 '20 at 18:09
  • 2
    There is no conditions which you need to meet, and it has been proven many times there are vectorized ways of dealing with conditions such as `np.where()`. Using for loops should be avoided and discouraged when dealing with pandas dfs – Celius Stingher Dec 22 '20 at 18:09
1

try:

df[df['Name'].str.match('^(ap[.])', case=False)].Name.unique() 

array(['Ap.ricot', 'AP.19'], dtype=object)
Pygirl
  • 12,969
  • 5
  • 30
  • 43