How to slice each individual element of a python list or array

Question

I have a python list which is derived from a pandas series as follows:

dsa = pd.Series(crew_data['Work Type'])
disc = [dsa]
print(disc)

The output is as follows:

[0      Disc - Standard Removal & Herbicide 
 1      Disc - Standard Removal & Herbicide  
 2                            Standard Trim  
 3                       Disc - Hazard Tree  
 4                       Disc - Hazard Tree  
                  ...                   
 134                     Disc - Hazard Tree  
 135                     Disc - Hazard Tree  
 136                     Disc - Hazard Tree  
 137                     Disc - Hazard Tree  
 138                     Disc - Hazard Tree  
 Name: Work Type, Length: 139, dtype: object]

Now the next step is to slice the first 4 characters of each element so that the value returned is Disc

This appears to be simple when performed on a single string, however when attempting to do this with a list for some reason appears to be almost impossible. This can be done simply in Excel using the formula =LEFT(A1,4), so surely it can be done as simple in python?

If anyone has a solution that would be great.

Is this list one big string, or are there multiple objects in the list? Could you provide a better example? — PacketLoss, Jan 29 '20 at 00:51
No these are individual objects. They represent a category code for each individual task in the system database — jasw, Jan 29 '20 at 00:54
Is there a reason you call `pd.Series()` on `crew_data['column']`? Typically, if `crew_data` is a `DataFrame`, getting a single columns will already give you a `Series`? — Grismar, Jan 29 '20 at 00:54
Depending on some of the details that aren't clear in your question, your question may have already been answered here https://stackoverflow.com/questions/36505847/substring-of-an-entire-column-in-pandas-dataframe — Grismar, Jan 29 '20 at 00:55
Thanks for the link. That worded perfectly. Everything that I searched on this topic provided a function with a for loop or something far more condeluded that didn't work... — jasw, Jan 29 '20 at 01:03
Does this answer your question? [substring of an entire column in pandas dataframe](https://stackoverflow.com/questions/36505847/substring-of-an-entire-column-in-pandas-dataframe) — AMC, Jan 29 '20 at 05:38

hpaulj · Accepted Answer · 2020-01-29T01:54:00.157

With a sample dataframe

In [138]: df                                                                                     
Out[138]: 
  col1  col2 col3 newcol
0    a     1    x    Wow
1    b     2    y    Dud
2    c     1    z    Wow
In [139]: df['newcol']                                                                           
Out[139]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object
In [140]: type(_)                                                                                
Out[140]: pandas.core.series.Series

Selecting a column gives me a Series; no need for another Series wrapper

In [141]: pd.Series(df['newcol'])                                                                
Out[141]: 
0    Wow
1    Dud
2    Wow
Name: newcol, dtype: object

We can put it in a list, but that doesn't do any good:

In [142]: [pd.Series(df['newcol'])]                                                              
Out[142]: 
[0    Wow
 1    Dud
 2    Wow
 Name: newcol, dtype: object]
In [143]: len(_)                                                                                 
Out[143]: 1

We can extract the values as a numpy array:

In [144]: pd.Series(df['newcol']).values                                                         
Out[144]: array(['Wow', 'Dud', 'Wow'], dtype=object)

We can apply a string slicing to each element of either the array or series - with a list comprehension:

In [145]: [astr[:2] for astr in _144]                                                            
Out[145]: ['Wo', 'Du', 'Wo']
In [146]: [astr[:2] for astr in _141]                                                            
Out[146]: ['Wo', 'Du', 'Wo']

The list comprehension isn't necessarily the most 'advanced' way, but it's a good start. Actually it is close to the best, since slicing a string has to use string methods; no one else implements string slicing.

pandas has a str method for applying string methods to a series:

In [147]: ds = df['newcol']  
In [151]: ds.str.slice(0,2)        # or ds.str[:2]                                                               
Out[151]: 
0    Wo
1    Du
2    Wo
Name: newcol, dtype: object

This is cleaner and prettier than the list comprehensions, but actually slower.

Very nice +1. Is the last code block, ds.str.slice(0,2) suppose to be df.str.slice(0,2) — merit_2, Jan 29 '20 at 01:37
I missed a copy line, assigning the `Out[141]` series to `ds`. @merit_2 — hpaulj, Jan 29 '20 at 01:55

score 0 · Answer 2 · answered Jan 29 '20 at 01:08

I might be missing the gist of the question, but here's a regular expression implementation.

import re

# Sample data
disc = ['                       Disc - Standard Removal & Herbicide ',
 '      Disc - Standard Removal & Herbicide  ',
'                           Standard Trim  ',
'                       Disc - Hazard Tree',
'                      Disc - Hazard Tree ',]

# Regular Expression pattern
# We have Disc in parenthesis because that's what we want to capture.
# Using re.search(<pattern>, <string>).group(1) returns the first matching group. Using just
# re.search(<pattern>, <string>).group() would return the entire row.
disc_pattern = r"\s+?(Disc)\s+?"

# List comprehension that skips rows without 'Disc'
[re.search(disc_pattern, i).group(1) for i in disc if re.match(disc_pattern, i)]

Output:

['Disc', 'Disc', 'Disc', 'Disc']

How to slice each individual element of a python list or array

2 Answers2