1

Suppose I have a 2D numpy array like this:

arr = np.array([[1, 2], [3, 4], [5, 6]])
# array([[1, 2],
#        [3, 4],
#        [5, 6]])

How can one transform that to a "long" structure with one record per value, associated with the row and column index? In this case that would look like:

df = pd.DataFrame({'row': [0, 0, 1, 1, 2, 2],
                  'column': [0, 1, 0, 1, 0, 1],
                  'value': [1, 2, 3, 4, 5, 6]})

melt only assigns the column identifier, not the row:

pd.DataFrame(arr).melt()
#   variable    value
# 0        0        1
# 1        0        3
# 2        0        5
# 3        1        2
# 4        1        4
# 5        1        6

Is there a way to attach the row identifier?

Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
  • For efficient solutions using NumPy - https://stackoverflow.com/questions/46135070/generalise-slicing-operation-in-a-numpy-array – Divakar Dec 01 '18 at 07:36

2 Answers2

2

Pass index to idvar:

pd.DataFrame(arr).reset_index().melt('index')
#    index variable  value
# 0      0        0      1
# 1      1        0      3
# 2      2        0      5
# 3      0        1      2
# 4      1        1      4
# 5      2        1      6

You can rename:

df = pd.DataFrame(arr).reset_index().melt('index')
df.columns = ['row', 'column', 'value']
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132
BENY
  • 317,841
  • 20
  • 164
  • 234
1

melt can use the index if it's a column:

arrdf = pd.DataFrame(arr)
arrdf['row'] = arrdf.index
arrdf.melt(id_vars='row', var_name='column')

#    row    column  value
# 0    0         0      1
# 1    1         0      3
# 2    2         0      5
# 3    0         1      2
# 4    1         1      4
# 5    2         1      6
Max Ghenis
  • 14,783
  • 16
  • 84
  • 132