3

There is a huge matrix whose elements are numbers in the range of 1 to 15. I want to transform the matrix to the one whose elements be letters such that 1 becomes "a", 2 becomes "b", and so on. Finally I want to merge each row and create a sequence of it. As a simple example:

import pandas as pd
import numpy as np, numpy.random
numpy.random.seed(1)
A = pd.DataFrame (np.random.randint(1,16,10).reshape(2,5)) 
A.iloc[1,4]= np.NAN
A
#   0   1   2   3   4
#0  6   12  13  9   10.0
#1  12  6   1   1   NaN

If there were no Na in the dataset, I would use this code:

pd.DataFrame(list(map(''.join, A.applymap(lambda n: chr(n + 96)).as_matrix())))

Here, it gives this error:

TypeError: ('integer argument expected, got float', 'occurred at index 4')

The expected output is:

    0
0   flmij
1   lfaa

The first row should have 5 elements and the second one should have 4 elements.

Hadij
  • 3,661
  • 5
  • 26
  • 48
  • Are you sure the NaN are the problems ? You also have a float number (the only one) at index 4, just next to the NaN. The float appears in the error message, not the NaN. – IMCoins May 30 '18 at 17:55
  • I guess you can replace nan with a special number, for example 0, and then replace the corresponding character with '' later. – anishtain4 May 30 '18 at 17:55
  • @IMCoins no the problem is NaN. I add this code `A.iloc[0,4]= np.NAN` to make all column 4 equal to Nan and the same error occurred. – Hadij May 30 '18 at 18:03
  • @anishtain4 Good suggestion. Do you know what is the code for blank in order to replace NaN with that code? – Hadij May 30 '18 at 18:04
  • It doesn't really have to be a blank space. As long as it's not any of the other characters and you know it. It's faster than the selected answer. One of the answers have already implemented it, but in a bad way. Just move the `str` and the rest out of apply. – anishtain4 May 31 '18 at 18:45

3 Answers3

3

Use if-else condition with sum:

df = pd.DataFrame(A.applymap(lambda n: chr(int(n) + 96) if pd.notnull(n) else '')
                   .values.sum(axis=1))
print (df)
       0
0  flmij
1   lfaa

Details:

print (A.applymap(lambda n: chr(int(n) + 96) if pd.notnull(n) else ''))
   0  1  2  3  4
0  f  l  m  i  j
1  l  f  a  a   

print (A.applymap(lambda n: chr(int(n) + 96) if pd.notnull(n) else '').values)
[['f' 'l' 'm' 'i' 'j']
 ['l' 'f' 'a' 'a' '']]

print (A.applymap(lambda n: chr(int(n) + 96) if pd.notnull(n) else '').values.sum(axis=1))
['flmij' 'lfaa']

Another solution:

print (A.stack().astype(int).add(96).apply(chr).sum(level=0))
0    flmij
1     lfaa
dtype: object

Details:

Reshape to Series:

print (A.stack())
0  0     6.0
   1    12.0
   2    13.0
   3     9.0
   4    10.0
1  0    12.0
   1     6.0
   2     1.0
   3     1.0
dtype: float64

Convert to integers:

print (A.stack().astype(int))
0  0     6
   1    12
   2    13
   3     9
   4    10
1  0    12
   1     6
   2     1
   3     1
dtype: int32

Add number:

print (A.stack().astype(int).add(96))
0  0    102
   1    108
   2    109
   3    105
   4    106
1  0    108
   1    102
   2     97
   3     97
dtype: int32

Convert to letters:

print (A.stack().astype(int).add(96).apply(chr))
0  0    f
   1    l
   2    m
   3    i
   4    j
1  0    l
   1    f
   2    a
   3    a
dtype: object

Sum by first level of MultiIndex:

print (A.stack().astype(int).add(96).apply(chr).sum(level=0))
0    flmij
1     lfaa
dtype: object
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you very much for your answer. It works. Is it possible to explain how `sum` works here? I don't want to just copy it. I want to learn more. – Hadij May 30 '18 at 18:09
  • @Hadij - Sum here working like `join` - with numpy array created from `Dataframe` – jezrael May 30 '18 at 18:15
1

Could use a categorical. Useful if you're doing more than just mapping to individual characters.

import pandas as pd
import numpy as np, numpy.random
numpy.random.seed(1)
A_int = pd.DataFrame(np.random.randint(1,16,10).reshape(2,5)) 
A_int.iloc[1,4]= np.NAN

int_vals = list(range(1,16))
chr_vals = [chr(n+96) for n in int_vals]
A_chr = A_int.apply(axis=0, func=lambda x: pd.Categorical(x, categories=int_vals, ordered=True).rename_categories(chr_vals))

A_chr.apply(axis=1, func=lambda x: ''.join([str(i) for i in x[pd.notnull(x)]]))
jvd10
  • 1,826
  • 17
  • 17
1

try this,

A.fillna(0,inplace=True)
A.applymap(lambda x: (chr(int(x) + 96))).sum(axis=1).str.replace('`','')

0    flmij
1     lfaa
dtype: object
Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111