How to add a column to numpy recarry

Question

I want to add extra raw to my numpy recarray with reformated date.

i have a csv:

<DATE>  <TIME>  <OPEN>  <HIGH>  <LOW>   <CLOSE> <TICKVOL>   <VOL>   <SPREAD>
2020.08.17  00:00:00    44.920  44.920  44.900  44.910  4   0   10
2020.08.17  00:01:00    44.910  44.910  44.850  44.860  10  0   10
2020.08.17  00:02:00    44.860  44.870  44.860  44.860  3   0   10
2020.08.17  00:03:00    44.860  44.860  44.850  44.850  2   0   10

My code:

def datetostr(datenp):
    ts = pd.to_datetime(str(datenp)) 
    d = ts.strftime('%Y.%m.%d %H:%M:%S')
    return d

colnames = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread']
stocks = pd.read_csv(infile, sep='\t', parse_dates=[['Date', 'Time']], header=0, names=colnames).to_records(index=False)
plotly_date = np.array([datetostr(xi) for xi in stocks['Date_Time']])

In stocks array:

('Date_Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread')
initial_array :  [('2020-08-14T00:00:00.000000000', 44.96, 45.  , 44.94, 44.97, 14, 0, 10)
 ('2020-08-14T00:01:00.000000000', 44.97, 44.99, 44.92, 44.95, 19, 0, 10)
 ('2020-08-14T00:02:00.000000000', 44.94, 44.94, 44.89, 44.91, 16, 0, 10)

In plotly_date:

plotly_date_array :  ['2020.08.14 00:00:00' '2020.08.14 00:01:00' '2020.08.14 00:02:00' ...
 '2020.08.18 20:57:00' '2020.08.18 20:58:00' '2020.08.18 20:59:00']

I want to add a new column to stocks with textformat data, stored in plotly_date

result = np.column_stack((stocks, plotly_date))

It gets me an error:

TypeError: invalid type promotion

What i do wrong? and how to add a new column named "Date" properly?

Trenton McKinney · Accepted Answer · 2020-08-31T18:30:25.750

numpy.column_stack is used to stack 1-D arrays as columns into a 2-D array.
- This data is a (4,) numpy.recarray, but if it was a nparray, it would be (4, 8).
Using Numpy, Add Column to existing structured array
numpy: Recarray Helper Functions
See np.hstack to combine recarray's with the same shape and fields.

# convert plotly_date from an ndarray into a recarray
plotly_date_rec = np.core.records.fromarrays(plotly_date.reshape((1, 4)), names='pd', formats='<U19')

# create a new dtype, with stocks dtype + plotly_date_rec dtype
new_dt = np.dtype(stocks.dtype.descr + [('pd', '<U19')])

# create an empty results recarray filled with zeros
result = np.zeros(stocks.shape, dtype=new_dt)

# fill the zeros with data from stocks
for name in stocks.dtype.names:
    result[name] = stocks[name]

# add the plotly_date_rec data
result['pd'] = plotly_date_rec['pd']

# print(result)
array([('2020-08-17T00:00:00.000000000', 44.92, 44.92, 44.9 , 44.91,  4, 0, 10, '2020.08.17 00:00:00'),
       ('2020-08-17T00:01:00.000000000', 44.91, 44.91, 44.85, 44.86, 10, 0, 10, '2020.08.17 00:01:00'),
       ('2020-08-17T00:02:00.000000000', 44.86, 44.87, 44.86, 44.86,  3, 0, 10, '2020.08.17 00:02:00'),
       ('2020-08-17T00:03:00.000000000', 44.86, 44.86, 44.85, 44.85,  2, 0, 10, '2020.08.17 00:03:00')],
      dtype=[('Date_Time', '<M8[ns]'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Tickvol', '<i8'), ('Vol', '<i8'), ('Spread', '<i8'), ('pd', '<U19')])

Using pandas

This is much easier

# create dataframe
colnames = ['Date', 'Time', 'Open', 'High', 'Low', 'Close', 'Tickvol', 'Vol', 'Spread']
stocks = pd.read_csv('test.csv', sep='\\s+', parse_dates=[['Date', 'Time']], header=0, names=colnames)

# add plotly_dates column
stocks['plotly_date'] = stocks.Date_Time.dt.strftime('%Y.%m.%d %H:%M:%S')

# create a numpy recarray of the dataframe with all columns
result = stocks.to_records(index=False)

# create a numpy recarray of the dataframe without Date_Time
results = stocks.iloc[:, 1:].to_records(index=False)  # optional depending on your needs


# print(result)  # shown with all columns

rec.array([('2020-08-17T00:00:00.000000000', 44.92, 44.92, 44.9 , 44.91,  4, 0, 10, '2020.08.17 00:00:00'),
           ('2020-08-17T00:01:00.000000000', 44.91, 44.91, 44.85, 44.86, 10, 0, 10, '2020.08.17 00:01:00'),
           ('2020-08-17T00:02:00.000000000', 44.86, 44.87, 44.86, 44.86,  3, 0, 10, '2020.08.17 00:02:00'),
           ('2020-08-17T00:03:00.000000000', 44.86, 44.86, 44.85, 44.85,  2, 0, 10, '2020.08.17 00:03:00')],
          dtype=[('Date_Time', '<M8[ns]'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Tickvol', '<i8'), ('Vol', '<i8'), ('Spread', '<i8'), ('plotly_date', 'O')])

yeah, looks stupid now. i just thought that it works only in read_csv. Now i realize, that i can do all things inside a dataframe, and after just convert df to recarray. Thanks mate! you did in 15 mins that i try to solve whole day))) — navy, Aug 31 '20 at 18:20

How to add a column to numpy recarry

1 Answers1

Using pandas

Linked