0

I have two CSV files: "userfeatures" and "itemfeatures". Each line in the userfeature is related to specific user. e.g., the first line in the userfeature file is:

005c2e08","Action","nm0000148","dir_ nm0764316","USA"

I need to find the intersection of this line with every line of the 2nd file "itemfeatures". (Actually , I need to repeat this procedure for all the users, i.e, for all lines of "userfeatures").

So, the first comparison will be with the first line of "itemfeatures" that is:

"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"

The result of intersection should be ["Action", "USA]" but unfortunately, my code only finds ["USA"] as a match. Here is what I've tried so far:

public class Main {
  public static void main(String[] args) throws Exception {   
     BufferedReader userfeatures = new BufferedReader(new FileReader("userFeatureVectorsTest.csv"));
     BufferedReader itemfeatures = new BufferedReader(new FileReader("ItemFeatureVectorsTest.csv"));       
     ArrayList<String> userlines = new ArrayList<>();
     ArrayList<String> itemlines = new ArrayList<>();
     String Uline = null;      
        while ((Uline = userfeatures.readLine()) != null) {
            for (String Iline = itemfeatures.readLine(); Iline != null; Iline = itemfeatures.readLine()) {
                System.out.println(Uline); 
                System.out.println(Iline);                
                System.out.println(intersect(Uline, Iline)); 
                System.out.println(union(Uline, Iline)); 
            }
        }
 userfeatures.close();
 itemfeatures.close();
 }    
  static Set<String> intersect(String Uline, String Iline) {
      Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
      Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
      result.retainAll(IlineSet);
      return result;
   }  
  static Set<String> union(String Uline, String Iline) {
      Set<String> result = new HashSet<String>(Arrays.asList(Uline.split(",")));
      Set<String> IlineSet = new HashSet<String>(Arrays.asList(Iline.split(",")));
      result.addAll(IlineSet);
      return result;
   }
}

I think the problem is related to Uline.split(",") and Iline.split(",") because they consider "Comedy,Action" as 1 word and so it cannot find [Action] as intersection of "Comedy,Action" and "Action". I appreciate it if someone has any idea how to fix this issue.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
mOna
  • 2,341
  • 9
  • 36
  • 60

2 Answers2

2

Try removing the double quotes in both strings .

Because when you split

"tt0306047","Comedy,Action","nm0267506,nm0000221,nm0356021","dir_ nm0001878","USA"

You will get an

Action"

token , which will never match the

"Action"

token.

Arnaud
  • 17,229
  • 3
  • 31
  • 44
  • thanks for your answer. Sorry I am relaly new to Java. should I use `line = line.replace("\"", "");` ? – mOna Dec 08 '15 at 15:10
1

If you print your line, what does it look like? I think your issue is in reading the file, for example:

"005c2e08","Action","nm0000148","dir_ nm0764316","USA"

split by ',' will result in:

"005c2e08" "Action"

and so on. While for your second line it will be:

"tt0306047" "Comedy Action"

This is why USA is intercepting, but action is not.

Use A csv reader to read in the csv file, then split the attributes of the CSV line by comma. That way you get rid of the quoutes and your code will work

for example, this library is very handy for reading CSV files:

http://opencsv.sourceforge.net/

pandaadb
  • 6,306
  • 2
  • 22
  • 41
  • No worries. It should really work out of the box for your example. Otherwise, you can tell opencsv what your separator and what your quote is. that way, it will treat commas inside quotes as such, and outside quotes as separators for your CSV. There's a good example on their website. – pandaadb Dec 08 '15 at 15:16
  • Thanks a lot for your help. I got the answer by simply removing the "" from both strings :) – mOna Dec 08 '15 at 15:22