1

I want to explode the numpy.ndarray and apply the each values of the array to the first element of the dataframe. Here is the input

Here is my dataframe

Id       Dept

100    Healthcare


Here is my numpy.ndarray

array(['2007-01-03', '2007-01-10', '2007-01-17', '2007-01-24'], dtype='datetime64[D]')


The Output looks like    
Id       Dept            Date

100    Healthcare        2007-01-03
100    Healthcare        2007-01-10
100    Healthcare        2007-01-17
100    Healthcare        2007-01-24

​

I need help to implement this logic.

prabuster
  • 57
  • 1
  • 5

3 Answers3

1

You can use pandas.concat and using the length of your array:

x = np.array(['2007-01-03', '2007-01-10', '2007-01-17', '2007-01-24'], dtype='datetime64[D]')

df = pd.concat([df]*len(x))
df['Date'] = x

print(df)
    Id        Dept       Date
0  100  Healthcare 2007-01-03
0  100  Healthcare 2007-01-10
0  100  Healthcare 2007-01-17
0  100  Healthcare 2007-01-24
Erfan
  • 40,971
  • 8
  • 66
  • 78
0

If you want to do an "all-to-all" matching you can do a cartesian product:

import numpy as np
import pandas as pd

df = pd.DataFrame([[100, 'Healthcare']], columns=['Id', 'Dept'])
date = np.array(['2007-01-03', '2007-01-10', '2007-01-17', '2007-01-24'],
                dtype='datetime64[D]')

df['_tmp'] = 0
df2 = pd.DataFrame({'Date': date})
df2['_tmp'] = 0
result = pd.merge(df, df2, on='_tmp').drop('_tmp', axis=1)
print(result)
#     Id        Dept       Date
# 0  100  Healthcare 2007-01-03
# 1  100  Healthcare 2007-01-10
# 2  100  Healthcare 2007-01-17
# 3  100  Healthcare 2007-01-24

This makes it more easily extendable to the case where you have more than one row in the first data frame, if that is relevant for you:

import numpy as np
import pandas as pd

df = pd.DataFrame([[100, 'Healthcare'], [200, 'Security']], columns=['Id', 'Dept'])
date = np.array(['2007-01-03', '2007-01-10', '2007-01-17', '2007-01-24'],
                dtype='datetime64[D]')

df['_tmp'] = 0
df2 = pd.DataFrame({'Date': date})
df2['_tmp'] = 0
result = pd.merge(df, df2, on='_tmp').drop('_tmp', axis=1)
print(result)
#     Id        Dept       Date
# 0  100  Healthcare 2007-01-03
# 1  100  Healthcare 2007-01-10
# 2  100  Healthcare 2007-01-17
# 3  100  Healthcare 2007-01-24
# 4  200    Security 2007-01-03
# 5  200    Security 2007-01-10
# 6  200    Security 2007-01-17
# 7  200    Security 2007-01-24
jdehesa
  • 58,456
  • 7
  • 77
  • 121
0

If you're not interested in column order, you can use the fact that assigning a single value to a new column, its length adapts to the dataframe's length:

import numpy as np
arr = np.array(['2007-01-03', '2007-01-10', '2007-01-17', '2007-01-24'])

df = pd.DataFrame({'Date': arr})
df['Id'] = 100
df['Dept'] = 'Healthcare'

#          Date   Id        Dept
# 0  2007-01-03  100  Healthcare                              
# 1  2007-01-10  100  Healthcare                            
# 2  2007-01-17  100  Healthcare                             
# 3  2007-01-24  100  Healthcare                            
SpghttCd
  • 10,510
  • 2
  • 20
  • 25