0

I am creating a program in java to perform Depth first search on a graph.

The problem is the input file has 5 million lines. Each row of the file contains two numbers to indicate an undirected edge.

What would be the best way to store this data?

HaveNoDisplayName
  • 8,291
  • 106
  • 37
  • 47
js091514
  • 165
  • 1
  • 9

1 Answers1

2

Assuming a modern computer with reasonable amounts of RAM you should have no problems creating 5 million objects to store your links. The data structure you use will most likely be dictated by how you intend to use it. In your case you want to perform a depth first search which means that it would be convenient to have the links referenced from both nodes they touch.

Assuming unweighted links a very simple example of a data structure might be:

class Node {
    private final int id;
    private final List<Node> linkedNodes = new ArrayList<>();

    public Node(int id) {
        this.id = id;
    }

    public void addLink(Node linkNode) {
        linkedNodes.add(linkNode);
    }
}

class Graph {
    private final Map<Integer, Node> nodes = new HashMap<>();
    public addLink(int id1, int id2) {
        getNode(id1).addLink(getNode(id2));
        getNode(id2).addLink(getNode(id1));
    }

    private getNode(int id) {
        if (!nodes.containsKey(id)) {
            nodes.add(new Node(id));
        }
        return nodes.get(id);
    }
}

Your search becomes relatively simple with this data structure:

public Node {
    public void search(List<Node> visitedList, Consumer<Node> action) {
        visitedList.add(this);
        linkedNodes.stream()
            .filter(n -> !visitedList.contains(n))
            .collect(Collectors.toList())
            .forEach(n -> n.search(visitedList, action);
        action.accept(this);
    }
}

I've used Java 8 streams here but converting to traditional iteration should not be too hard. Note that I collect the linked nodes into a list before searching them to avoid changing the list mid stream.

If you are running on light weight hardware or are going to be processing billions of rows then you might need to look at a lighter weight data structure.

sprinter
  • 27,148
  • 6
  • 47
  • 78