I have three dataframes, dictionary,SourceDictionary and MappedDictionary. The dictionary andSourceDictionary have only one column, say words as String. The dictionary which has million records, is a subset of MappedDictionary (Around 10M records) and each record in MappedDictionary is substring of dictionary. So, I need to map the ditionary with SourceDictionary to MappedDictionary. Example:
Records in ditionary : BananaFruit, AppleGreen
Records in SourceDictionary : Banana,grape,orange,lemon,Apple,...
Records to be mapped in MappedDictionary (Contains two columns) :
BananaFruit Banana
AppleGreen Apple
I planned to do like two for loops in java and make substring operation but the problem is 1 million * 10 million = 10 Trillion iterations Also, I can't get correct way to iterate a dataframe like a for loop Can someone give a solution for a way to make iteration in Dataframe and perform substring operations? Sorry for my poor English, I am a non-native Thanks for stackoverflow community members in advance :-)