25

I want to add an additional column to an existing dataframe that has the length of the 'seller_name' column as its value.

The output should be like so:

seller_name    name_length
-------------|-------------
Rick         |      4
Hannah       |      6

However, I'm having difficulty getting the code right.

df['name_length']  = len(df['seller_name'])

just gives me the length of the entire column (6845) And

df['nl']  = df[len('seller_name')]

Throws a KeyError.

Does anyone know the correct command to achieve my goal?

Many thanks!

root
  • 32,715
  • 6
  • 74
  • 87
Jasper
  • 2,131
  • 6
  • 29
  • 61

2 Answers2

40

Use the .str string accessor to perform string operations on DataFrames. In particular, you want .str.len:

df['name_length']  = df['seller_name'].str.len()

The resulting output:

  seller_name  name_length
0        Rick            4
1      Hannah            6
root
  • 32,715
  • 6
  • 74
  • 87
13

Say you have this data:

y_1980 = pd.read_csv('y_1980.csv', sep='\t')

     country  y_1980
0     afg     196
1     ago     125
2     al      23

If you want to calculate the length of any column you can use:

y_1980['length'] = y_1980['country'].apply(lambda x: len(x))
print(y_1980)

     country  y_1980  length
 0     afg     196       3
 1     ago     125       3
 2     al      23       2

This way you can calculate the length of any columns you desire.

everestial007
  • 6,665
  • 7
  • 32
  • 72
  • 4
    The pandas built-in methods are more robust than using `apply`. For example, this method will raise a TypeError if NaN is present in the string column, but the built-in `.str.len` will handle NaN. – root Mar 15 '17 at 17:55
  • 4
    this helped me when the column was a list – Asif Mohammed Mar 23 '19 at 04:55