2

In a pandas Dataframe I want to applymap(somefunction) using groupby (using some column index values).

mcve_01.txt

pos         index      M1      M2      F1_x 
16230484    141      G/G      G/G       G
16230491    141      C/C      C/C       C
16230503    141      T/T      T/T       T
16230524    141      T/T      T/T       T
16230535    141      .    .         T
16232072    211      A/A      A/A       A
16232072    211      A/A      A/A       A
16229783    211      C/C      C/C       G
16229992    211      A/A      A/A       G
16230007    211      T/T      T/T       A
16230011    263      G/G      G/G       C
16230049    263      A/A      A/A       T
16230174    263      .         .        T
16230190    263      A/A      A/A       T
16230260    263      A/A      A/A       G

I have function written to do some analyses for columns A, B, C, D where the values in A, B, C and D are list.

mcve_data = pd.read_csv('mcve_01.txt', sep='\t')

mcve_data.set_index(['pos', 'index'], append= True, inplace = True)
mcve_list = mcve_data.applymap(lambda c:[list(c)])

say the function is,

def mapfun(c):
if any(['.' in l for l in c]):
    return '.'

if all(['|' in l for l in c]):
    fun = zip

else:
    fun = product

filt_set = set(['|','/'])
filt = partial(filter,lambda l: not (l in filt_set))

return ','.join('g'.join(t) for t in fun(*map(filt, c)))

Finally:

mcve_mm = (mcve_list+mcve_list.shift(1)).dropna(how='all').\
    applymap(mapfun)

which gives me (final output):

pos    index      M1        M2      F1_x    
16230484    141  CgG,CgG,CgG,CgG        CgG,CgG,CgG,CgG         CgG
16230491    141  TgC,TgC,TgC,TgC        TgC,TgC,TgC,TgC         TgC
.....      ...   TgT,TgT,TgT,TgT        TgT,TgT,TgT,TgT         TgT
               .        .       TgT
               .        .       AgT
               AgA,AgA,AgA,AgA          AgA,AgA,AgA,AgA         AgA
               CgA,CgA,CgA,CgA          CgA,CgA,CgA,CgA         GgA
               AgC,AgC,AgC,AgC          AgC,AgC,AgC,AgC         GgG
               TgA,TgA,TgA,TgA          TgA,TgA,TgA,TgA         AgG
               GgT,GgT,GgT,GgT          GgT,GgT,GgT,GgT         CgA
               AgG,AgG,AgG,AgG          AgG,AgG,AgG,AgG         TgC

So, this code works if I want to run the function (mapfun) for the whole dataframe without grouping. But, i want to run the function by grouping them by index values.

Unfortunately, I don't see any example of groupby and applymap together.

I tried then reindexing the index column and then wrap the function (mapfun) within apply, which didn't work.

mcve_mm = (mcve_list+mcve_list.shift(1)).dropna(how='all').groupby(['f1_index'], group_keys = False).apply(lambda x: [mapfun])

I didn't get any error but the function part got messed up when trying to group and then apply.

Output I am getting:

f1_index
141.0     [<function mapfun at 0x7fee93550f28>]
211.0     [<function mapfun at 0x7fee93550f28>]
263.0     [<function mapfun at 0x7fee93550f28>]
dtype: object

Expected output:

same as final output but the output (functional part) grouped by the common index values

Now, I want to take this function and applymap in this column by grouping the data/frame using the values in one of the column or index.

data_groupby = (df+df.shift(1)).dropna(how='all').\
applymap(fnc) using groupby

I tried resetting the index and then groupby using the index name. But, the def fnc() is specific to the data from columns A, B, C, D. Also, I am not finding any examples and tutorial that uses applymap along with groupby in pandas df.

everestial007
  • 6,665
  • 7
  • 32
  • 72
  • 1
    This feels like the [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) where `applymap` is your attempted solution to a problem you do not explain. You omit quite a bit like function and real data points. Please back up and explain the real problem, namely original input and desired output. – Parfait Feb 11 '17 at 02:36
  • @Parfait: Just give me a minute. – everestial007 Feb 11 '17 at 03:05

1 Answers1

3

A DataFrameGroupBy is a dictionary of DataFrames, not a single DataFrame. You can use applymap on the subgroups:

import pandas as pd
from numpy.random import random, randint

# Dummy data
vdata = pd.DataFrame(randint(2, size=(32,4)))
vdata.columns=[list('ABCD')]

vgb = vdata.groupby(('A','B'))
altered = []
for index, subframe in vgb:
    subframe = subframe.applymap(lambda x: x*2)
    altered.append(subframe)
    print index
    print subframe
    assert(subframe.A.mean() == index[0]*2)
    assert(subframe.B.mean() == index[1]*2)

vdata = pd.concat(altered)
print vdata
cphlewis
  • 15,759
  • 4
  • 46
  • 55
  • Can you look into the question and data once again. I just updated it. – everestial007 Feb 11 '17 at 03:30
  • Just tried to implement your method to fix the problem but didn't pan out. Looks like the way my function for `applymap` is prepared isn't being read cell by cell when doing it with `groupby`. – everestial007 Feb 11 '17 at 04:05
  • Still not a [MVCE], too tedious to debug. `applymap` *will* apply a function cell by cell, that's what it does. – cphlewis Feb 11 '17 at 04:35
  • The only part that needs to be debug is the last one. Where I want to run **mcve_mm=...** with group by. It's perfectly working without group by though. I tried to group the data preceding this code but it later it says groupby doesn't have applymap attribute. I read through several tutorials, still not finding the helpful one. Yeah my analyses needs to be done cell by cell, else apply would have worked. Tried to wrap the **mapfun** with apply(lambda x: [mapfun] x in c) but I think that's wrong. Ahh ! – everestial007 Feb 11 '17 at 04:42
  • Hi, Why is it not MCVE. I have given the input and output which are short, plus all the code that was successful before applying groupby. I think I can improve more if you could please tell me what needs to be added. Thank you – everestial007 Feb 11 '17 at 04:45
  • 2
    You say "groupby doesn't have applymap attribute" -- you do understand that `groupby` is a function that returns a dictionary? You run applymap on the DataFrames that are *elements of* what `groupby` returns. – cphlewis Feb 11 '17 at 04:57
  • Ok, now I know that groupby returns a dictionary. So, it should be trying to run the applymap separately for unique keys like in dict(list) or defaultdict. So what is the approach to run this **mapfun** on each group of unique index values separately. Any ideas. – everestial007 Feb 11 '17 at 05:07
  • In each cell I am reading two lines (by columns) using **mcve_list+mcve_list.shift(1)** so want to break and continue the function when the index value changes. – everestial007 Feb 11 '17 at 05:09
  • do you think you can help me through chat. – everestial007 Feb 11 '17 at 18:03
  • Looks like I have got the hung of the method you described above. Someway it has worked for my dummy data, but still need to take it to the big data. **Btw, how do you append these sub dataframes after applymap has taken effect.** Not finding any answer anywhere. – everestial007 Feb 14 '17 at 20:59
  • If the answer works, mark it so. Here's a nice summary of ways to [join, merge, and concatenate Pandas dataframes](http://chrisalbon.com/python/pandas_join_merge_dataframe.html). – cphlewis Feb 14 '17 at 23:13
  • The answer isn't complete though. Since I still need to append the sub-frame after the function is run on it. Second, i need to explain the answer in the context of the input (question). I am still working to get this working on my big data and see how we can improve the answer for future (me or anyone else). But, you input did help. Thanks, – everestial007 Feb 15 '17 at 03:19