1

There is an excel file logging a set of data. Its columns are as below, where each column is seperated by comma.

SampleData
year,date,month,location,time,count
2019,20,Jan,Japan,22:33,1
2019,31,Jan,Japan,19:21,1
2019,1,Jan,Japan,8:00,1
2019,4,Jan,Japan,4:28,2
2019,13,Feb,Japan,6:19,1

From this data, I would like to create python pandas dataframe, which looks like below.

DataFrame
u_datetime,location,count
1547991180,Japan,1
1548930060,Japan,1
1546297200,Japan,1
1546543680,Japan,2
1550006340,Japan,1

One of the DataFrame methods can be useful for this operation, but it does not take date with one digit.

pandas.to_datetime(
    DataFrame["year"].astype(str)
    + DataFrame["month"].astype(str) 
    + DataFrame["date"].astype(str)
    + DataFrame["time"].astype(str),
    format="%Y%b%d%-H%M"
)

Could anybody give me a hand?

Thank you.

Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
wisy
  • 11
  • 1

3 Answers3

1

try this

from datetime import datetime

data['datetime'] = data[['year','date','month','time']].apply(lambda x: datetime.strptime(str(x['year'])+'-'+str(x['date'])+'-'+str(x['month'])+' '+str(x['time']), "%Y-%d-%b %H:%M").timestamp(), axis=1)

data[['datetime','location','count']]

Output

      datetime  \
0  1548003780.0   
1  1548942660.0   
2  1546309800.0   
3  1546556280.0   
4  1550018940.0   

  location  \
0    Japan   
1    Japan   
2    Japan   
3    Japan   
4    Japan   

   count  
0      1  
1      1  
2      1  
3      2  
4      1 
iamklaus
  • 3,720
  • 2
  • 12
  • 21
  • Yes, your solution working, but OP need pandas solution, not python. Also `apply` are loops under the hood, so better not use if exist vectorized solutions, here function `pd.to_datetime`. [link](https://stackoverflow.com/a/24871316/2901002) – jezrael Mar 26 '19 at 07:22
1

In case you are working with csv file this can be done easily using parse_dates.

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv('/home/users/user/xxx.csv', parse_dates ={'date_time':[0,1,2,4]})
df['u_datetime'] = df['date_time'].values.astype(np.int64) // 10 ** 9
df_new = df[['u_datetime', 'location', 'count']]
Loochie
  • 2,414
  • 13
  • 20
0

You are close, need %Y%b%d%H:%M format and then convert to unix time by cast to int64 with integer division by 10**9:

s = (DataFrame["year"].astype(str)+
     DataFrame["month"].astype(str)+
     DataFrame["date"].astype(str)+
     DataFrame["time"].astype(str))
DataFrame['u_datetime'] = pd.to_datetime(s, format="%Y%b%d%H:%M").astype(np.int64) // 10**9

DataFrame = DataFrame[['u_datetime','location','count']]
print (DataFrame)
   u_datetime location  count
0  1548023580    Japan      1
1  1548962460    Japan      1
2  1546329600    Japan      1
3  1546576080    Japan      2
4  1550038740    Japan      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252