19

So I know Python strings are immutable, but I have a string:

c['date'] = "20110104"

Which I would like to convert to

c['date'] = "2011-01-04"

My code:

c['date'] = c['date'][0:4] + "-" + c['date'][4:6] + "-" + c['date'][6:]

Seems a bit convoluted, no? Would it be best to save it as a separate variable and then do the same? Or would there basically be no difference?

LittleBobbyTables
  • 4,361
  • 9
  • 38
  • 67

8 Answers8

25

You could use .join() to clean it up a little bit:

d = c['date']
'-'.join([d[:4], d[4:6], d[6:]])
Blender
  • 289,723
  • 53
  • 439
  • 496
10

Dates are first class objects in Python, with a rich interface for manipulating them. The library is datetime.

> import datetime
> datetime.datetime.strptime('20110503','%Y%m%d').date().isoformat()
'2011-05-03'

Don't reinvent the wheel!

Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
  • 2
    thus: `c['date'] = pd.to_datetime(c['date'], format = '%Y%m%d')` . See: https://stackoverflow.com/questions/26763344/convert-pandas-column-to-datetime – Richard Oct 03 '20 at 02:57
6

You are better off using string formatting than string concatenation

c['date'] = '{}-{}-{}'.format(c['date'][0:4], c['date'][4:6], c['date'][6:])

String concatenation is generally slower because as you said above strings are immutable.

GWW
  • 43,129
  • 11
  • 115
  • 108
4
s = '20110104'


def option_1():
    return '-'.join([s[:4], s[4:6], s[6:]])

def option_1a():
    return '-'.join((s[:4], s[4:6], s[6:]))

def option_2():
    return '{}-{}-{}'.format(s[:4], s[4:6], s[6:])

def option_3():
    return '%s-%s-%s' % (s[:4], s[4:6], s[6:])

def option_original():
    return s[:4] + "-" + s[4:6] + "-" + s[6:]

Running %timeit on each yields these results

  • option_1: 35.9 ns per loop
  • option_1a: 35.8 ns per loop
  • option_2: 36 ns per loop
  • option_3: 35.8 ns per loop
  • option_original: 36 ns per loop

So... pick the most readable because the performance improvements are marginal

Rob Cowie
  • 22,259
  • 6
  • 62
  • 56
1

I'd probably do so this way, not that there's a great deal of gain:

d = c['date']
c['date'] = '%s-%s-%s' % (d[:4], d[4:6], d[6:])

The big improvement (imho) is avoiding string concatenation.

g.d.d.c
  • 46,865
  • 9
  • 101
  • 111
1

I'm not usually the guy saying "use regex," but this is a good use-case for it:

import re    
c['date']=re.sub(r'.*(\w{4})(\w{2})(\w{2}).*',r"\1-\2-\3",c['date'])
SuperFamousGuy
  • 1,455
  • 11
  • 16
1

I am not sure if you want to convert it to a proper datetime object or rather just hard code the format, you can do the following:

from datetime import datetime
result = datetime.strptime(c['date'], '%Y%m%d')
print result.date().isoformat()

Input: '20110104'

Output: '2011-01-04'

Community
  • 1
  • 1
user1613017
  • 1,703
  • 2
  • 11
  • 8
1

Add hyphen to a series of strings to datetime

import datetime
for i in range (0,len(c.date)):
  c.date[i] = datetime.datetime.strptime(c.date[i],'%Y%m%d').date().isoformat()