1

I have two CSV files: "userfeatures" and "itemfeatures".

I should compare each line of userfeatures with each line of itemfeatures to find the matches (intersections) with each line. For example, the first line in the userfeature file is:

005c2e08","Action","nm0000148","dir_ nm0764316","India"

Now, I need to find the intersection of this line (whish is related to user-1) with every line of the 2nd file "itemfeatures". The second file has the same structure, So for instance, the first comparison will be with the first line of "itemfeatures" that is:

"tt0306047","Comedy","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA,Canada"

Here is what I've tried so far:

public class Main {
      public static void main(String[] args) throws Exception {   
         BufferedReader userfeatures = new BufferedReader(new FileReader("userfeatures.csv"));
         BufferedReader itemfeatures = new BufferedReader(new FileReader("itemfeatures.csv"));       
         ArrayList<String> userlines = new ArrayList<>();
         ArrayList<String> itemlines = new ArrayList<>();
         String Uline = null;
         String Iline = null;

         while ((Uline = userfeatures.readLine()) != null) {
                for (int i=1; i< userlines.size(); i++){
                   userlines.add(Uline); 
                   intersect(Uline, Iline).size();
                }
        }
     //  System.out.println(Uline);    
     userfeatures.close();
     itemfeatures.close();
     }       
      static ArrayList<String> intersect(String Uline, String Iline) {
           ArrayList<String> result = new ArrayList<String>();
           result.retainAll(Iline);
           return result;
        }
    }

It seems I cannot use retainAll for the type "String", so I was wondering how could I fix this issue? I searched here a lot, but all I found was about finding intersection of arrays except this one. (but also this post was different with my case, since it compared each characters in a string while I need to compare word by word).

Community
  • 1
  • 1
mOna
  • 2,341
  • 9
  • 36
  • 60
  • Possible duplicate of [Intersection of two strings in Java](https://stackoverflow.com/questions/4448370/intersection-of-two-strings-in-java) – Tot Zam Aug 13 '17 at 04:34

2 Answers2

2

Try converting Uline and Iline into words, and change to use Set<String> instead of Array<String>:

static Set<String> intersect(String Uline, String Iline) {
    Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
    Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
    result.retainAll(IlineSet);
    return result;
}
Tot Zam
  • 8,406
  • 10
  • 51
  • 76
Viet
  • 3,349
  • 1
  • 16
  • 35
1

First, split the lines into arrays. Then, call retainAll on not empty arrays.

Tot Zam
  • 8,406
  • 10
  • 51
  • 76
gauee
  • 305
  • 3
  • 13
  • thanks for the answer. but I think the lines are already arrays of strings since when I print itemlines.get(1), I get this: `"tt0002199","Drama,Biography","nm0376639,nm0245769,nm0310155","dir_ nm0646058","USA" `which is an array of strings.. am I missing something? – mOna Dec 08 '15 at 02:34
  • Sorry, I mean that this single line should be also splitted by cvs seperator inside method intersect. Such approach as @Jerry06 mentioned. – gauee Dec 08 '15 at 02:45
  • thanks, I tried Jerry's response, however, still I have some errors.. I think I should fix also this part `intersect(Uline, Iline).size();` because I just notices that at the moment that I want to calculate intersect Iline is null.. I should read it as well... – mOna Dec 08 '15 at 02:53
  • Maybe will be worth to read content of those file seperately, because its amount of line can be different. After such action You will have simple double loop through all lines from both features contents. – gauee Dec 08 '15 at 03:00