0

i'm in trouble to transform a list in matrix.

My np.array is :

     import numpy as np
     tab2D1 = np.array([["2020-06-05", "grep"],["2020-06-06", "mkdir"],["2020-06-06", "rm"],
                        ["2020-06-05", "cat"],["2020-06-06", "grep"],["2020-06-07", "awk"],
                        ["2020-06-07", "rm"],["2020-06-07", "echo"],["2020-06-05", "grep"],
                        ["2020-06-05", "awk"]])

In output i would like a new matrix like :

            grep mkdir  rm  cat  awk  echo
2020-06-05    2    0    0    1    1    0
2020-06-06    1    1    1    0    0    0
2020-06-07    0    0    1    0    1    1

i tried with vstack, hstack but i'm not happy with. after the treatment i will show the result with matplotlib library

xurius
  • 11
  • 1
  • 1
    It looks like you're trying to create what's called a co-occurrence matrix between your two columns. This question isn't an exact match, but it might point you in the right direction: https://stackoverflow.com/questions/42814452/co-occurrence-matrix-from-list-of-words-in-python – a.deshpande012 Jun 23 '20 at 09:28
  • Hey, could you please edit your question to clarify it a little bit? Maybe instead of saying that you want to transform a list in a matrix, just say that you want to quantify the occurrences of elements in an array and display the summarized total aggregated by another corresponding quantity. – jpnadas Jun 23 '20 at 09:28

2 Answers2

0

You can do that with Pandas using pd.crosstab:

import numpy as np
import pandas as pd
tab2D1 = np.array([["2020-06-05", "grep"],["2020-06-06", "mkdir"],["2020-06-06", "rm"],
                   ["2020-06-05", "cat"],["2020-06-06", "grep"],["2020-06-07", "awk"],
                   ["2020-06-07", "rm"],["2020-06-07", "echo"],["2020-06-05", "grep"],
                   ["2020-06-05", "awk"]])
df = pd.crosstab(tab2D1[:, 0], tab2D1[:, 1])
print(df)
# col_0       awk  cat  echo  grep  mkdir  rm
# row_0
# 2020-06-05    1    1     0     2      0   0
# 2020-06-06    0    0     0     1      1   1
# 2020-06-07    1    0     1     0      0   1
jdehesa
  • 58,456
  • 7
  • 77
  • 121
0

That is so simple with pandas.

     import numpy as np
     import pandas as pd
     tab2D1 = np.array([["2020-06-05", "grep"],["2020-06-06", "mkdir"],["2020-06-06", "rm"],
                         ["2020-06-05", "cat"],["2020-06-06", "grep"],["2020-06-07", "awk"],
                         ["2020-06-07", "rm"],["2020-06-07", "echo"],["2020-06-05", "grep"], 
                          ["2020-06-05", "awk"]])
      df = pd.crosstab(tab2D1[:, 0], tab2D1[:, 1])
      print(df)
      # col_0       awk  cat  echo  grep  mkdir  rm
      # row_0
      # 2020-06-05    1    1     0     2      0   0
      # 2020-06-06    0    0     0     1      1   1
      # 2020-06-07    1    0     1     0      0   1

Thanks a lot.

xurius
  • 11
  • 1