0

I am iterating over the rows that are available, but it doesn't seem to be the most optimal way to do it -- it takes forever.

Is there a special way in Pandas to do it.

INIT_TIME = datetime.datetime.strptime(date + ' ' + time, "%Y-%B-%d %H:%M:%S")
#NEED TO ADD DATA FROM THAT COLUMN

df = pd.read_csv(dataset_path, delimiter=',',skiprows=range(0,1),names=['TCOUNT','CORE','COUNTER','EMPTY','NAME','TSTAMP','MULT','STAMPME'])
df = df.drop('MULT',1)
df = df.drop('EMPTY',1)
df = df.drop('TSTAMP', 1)
for index, row in df.iterrows():
    TMP_TIME = INIT_TIME + datetime.timedelta(seconds=row['TCOUNT'])
    df['STAMPME'] = TMP_TIME.strftime("%s")

In addition, the datetime I am adding is in the following format

2017-05-11 11:12:37.100192 1494493957
2017-05-11 11:12:37.200541 1494493957

and therefore the unix timestamp is same (and it is correct), but is there a better way to represent it?

tandem
  • 2,040
  • 4
  • 25
  • 52

2 Answers2

0

I'd rewrite your code like this

INIT_TIME = datetime.datetime.strptime(date + ' ' + time, "%Y-%B-%d %H:%M:%S")
INIT_TIME = pd.to_datetime(INIT_TIME)

df = pd.read_csv(
    dataset_path, delimiter=',',skiprows=range(0,1),
    names=['TCOUNT','CORE','COUNTER','EMPTY','NAME','TSTAMP','MULT','STAMPME']
)
df = df.drop(['MULT', 'EMPTY', 'TSTAMP'], 1)
df['STAMPME'] = pd.to_timedelta(df['TCOUNT'], 's') + INIT_TIME
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks for that. How about that timestamp issue. do you have any views on that? – tandem May 11 '17 at 17:23
  • @tandem what will be added are pandas Timestamps. They will look different and be of a date type. Not really sure what you're asking. – piRSquared May 11 '17 at 17:26
  • And the calculation for `df['STAMPME'] = pd.to_timedelta(df['TCOUNT'], 's') + INIT_TIME` doens't result in a unix timestamp – tandem May 11 '17 at 17:44
0

Assuming the datetimes are correctly reflecting what you're trying to do, with respect to Pandas you should be able to do:

df['STAMPME'] = df['TCOUNT'].apply(lambda x: (datetime.timedelta(seconds=x) + INIT_TIME).strftime("%s"))

As noted here you should not use iterrows() to modify the DF you are iterating over. If you need to iterate row by row (as opposed to using the apply method) you can use another data object, e.g. a list, to retain the values you're calculating, and then create a new column from that.

Also, for future reference, the itertuples() method is faster than iterrows(), although it requires you to know the index of each column (i.e. row[x] as opposed to row['name']).

abe
  • 355
  • 2
  • 9
  • The question still is how do I get millisecond level precision with unix timestamp in python – tandem May 11 '17 at 18:02
  • @tandem does this work for you? http://stackoverflow.com/questions/7588511/format-a-datetime-into-a-string-with-milliseconds – abe May 11 '17 at 19:37
  • http://stackoverflow.com/a/8778548/1059860 This seems more possible, but doesn't help me. I have tried a bit. – tandem May 12 '17 at 06:56
  • Hi @tandem, so with respect to the datetimes, do you mind sharing exactly what you are trying to achieve here? – abe May 12 '17 at 16:22