1

I have dataframe

member_id,device_type,device_id,event_type,event_path,event_duration
603609,url,mail.ru/,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,mail.ru/,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,mail.ru/,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,mail.ru/,3,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,mail.ru/community.livejournal.com/psp_ru,28,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,lady.mail.ru/article/491411-kurban-omarov-otvetil-na-obvinenija-ksenii-borodinoj/?from=mr_news,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,mail.ru/,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,lady.mail.ru/article/491411-kurban-omarov-otvetil-na-obvinenija-ksenii-borodinoj/?from=mr_news,0,pc,7d4a095373874b4fb26a2e6d070b6ad3
603609,url,lady.mail.ru/article/491411-kurban-omarov-otvetil-na-obvinenija-ksenii-borodinoj/?from=mr_news,0,pc,7d4a095373874b4fb26a2e6d070b6ad3

And I should find substring from another file and if it contain pattrn, create a column category from find.xlsx

url category    category2
falloutsite.ru/ Рубрики/Hi-Tech/Программы/Софт/Игры/    Рубрики/Hi-Tech/Программы/Софт/Игры/ 
kmzpub.ru/games.asp Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/  Рубрики/Hi-Tech/Программы/Софт/Игры/Универсальное/ 
sigma-team.ru/content/view/15/19    Рубрики/Hi-Tech/Программы/Софт/Игры/Quake и Counter-Strike/     Рубрики/Hi-Tech/Программы/Софт/Игры/Quake и Counter-Strike/ 
community.livejournal.com/psp_ru    Рубрики/Развлечения/Игры/Приставочные игры/     Рубрики/Развлечения/Игры/Приставочные игры/ 

I use

df = pd.read_csv('car owners games_category.csv')
find = pd.read_excel('blue.xlsx')
d = find.set_index('url')['category'].to_dict()
df['category'] = df.device_id.apply(lambda x: pd.Series([v for k,v in d.items() if k in x]))

to replace that to category, but it returns ValueError: Wrong number of items passed 2, placement implies 1. I try to use map and isin, but it need common string.

Petr Petrov
  • 4,090
  • 10
  • 31
  • 68
  • Could you please put some time into making your data [clipboard friendly](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? – Ivan Oct 24 '16 at 09:27

1 Answers1

1

After long time testing with real data there is problem Series from list comprehension return 2 category, not one in row 13.

One posible solution is use iloc[0] for return only first item from Series:

df['category'] = df.device_id
                   .apply(lambda x: pd.Series([v for k,v in d.items() if k in x]).iloc[0])

Another solution is remove this row by drop:

find.drop(13, inplace=True)

Testing all problematic rows:

#custom function return list to column 'category'
def f(x):
    return [v for k,v in d.items() if k in x]
df['category'] = df.device_id.apply(f)
print (df)

#filter all rows where length of list is not 1
print (df[df.category.apply(len) != 1])

#return length of problematic rows
print (df.ix[df.category.apply(len) != 1, 'category'].apply(len))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252