Good practice when manipulating data in java

Question

Is it bad practice to directly manipulate data like:

 Sorter.mergeSort(testData); //(testData is now sorted)

Or should I create A copy of the data and then manipulate and return that like:

 sortedData = Sorter.mergeSort(testData); // (sortedData is now sorted and testData remains unsorted)?

I have several sorting methods and I want to be consistent in the way they manipulate data. With my insertionSort method I can directly work on the unsorted data. However, if I want to leave the unsorted data untouched then I would have to create a copy of the unsorted data in the insertionSort method and manipulate and return that (which seems rather unnecessary). On the other hand in my mergeSort method I need to create a copy of the unsorted data one way or another so I ended up doing something that also seems rather unnecessary as a work around to returning a new sortedList:

List <Comparable> sorted = mergeSortHelper(target);
target.clear();
target.addAll(sorted);`

Please let me know which is the better practice, thanks!

I would do what you believe is the simplest and clearest. You could do either AFAIK. — Peter Lawrey, Sep 18 '13 at 15:20
Take a look at major sort function, they certainly had a though about it (Collections.sort, ...) — Michael Laffargue, Sep 18 '13 at 15:21
Look into Google Guava. It has lots of utilities for collections that make your life easier once you get used to them. — allprog, Sep 18 '13 at 15:23

score 5 · Answer 1 · answered Sep 18 '13 at 15:23

It depends whether you're optimising for performance or functional purity. Generally in Java functional purity is not emphasised, for example Collections.Sort sorts the list you give it (even though it's implemented by making an array copy first).

I would optimise for performance here, as that seems more like typical Java, and anyone who wants to can always copy the collection first, like Sorter.mergeSort(new ArrayList(testData));

score 2 · Answer 2 · answered Sep 18 '13 at 15:26

2

The best practice is to be consistent.

Personally I prefer my methods to not modify the input parameters since it might not be appropriate in all situations (you're pushing the responsibility onto the end user to make a copy if they need to preserve the original ordering).

That being said, there are clear performance benefits of modifying the input (especially for large lists). So this might be appropriate for your application.

As long as the functionality is clear to the end user you're covered either way!

answered Sep 18 '13 at 15:26

StuPointerException

7,117
5
29
54

Good point on clarity. If you choose to modify in-place, you can make that clear by making the function `void`. As it doesn't return a new Collection, it clearly modifies the input Collection. Of course javadoc comments are also helpful. – MikeFHay Sep 18 '13 at 15:29

Paul · Answer 3 · 2013-09-19T15:33:43.303

In Java I usually provide both options (when writing re-usable utility methods, anyway):

/** Return a sorted copy of the data from col. */
public List<T> mergeSort(Collection<T extends Comparable<T>> col);

/** Sort the data in col in place. */
public void mergeSortIn(List<T extends Comparable<T>> col);

I'm making some assumptions re the signatures and types here. That said, the Java norm is - or at least, has been* - generally to mutate state in place. This is often a dangerous thing, especially across API boundaries - e.g. changing a collection passed to your library by its 'client' code. Minimizing the overall state-space and mutable state in particular is often the sign of a well designed application/library.

It sounds like you want to re-use the same test data. To do that I would write a method that builds the test data and returns it. That way, if I need the same test data again in a different test (i.e. to test your mergeSort() / insertionSort() implementations on the same data), you just build and return it again. I commonly do exactly this in writing unit tests (in JUnit, for example).

Either way, if your code is a library class/method for other people to use you should document its behaviour clearly.

Aside: in 'real' code there shouldn't really be any reason to specify that merge sort is the implementation used. The caller should care what it does, not how it does it - so the name wouldn't usually be mergeSort(), insertionSort(), etc.

(*) In some of the newer JVM languages there has been a conscious movement away from mutable data. Clojure has NO mutable state at all as it is a pure functional programming language (at least in normal, single-threaded application development). Scala provides a parallel set of collection libraries that do not mutate the state of collections. This has major advantages in multi-threaded, multi-processor applications. This is not as time/space expensive as might be naively expected, due to the clever algorithms the collections use.

score 0 · Answer 4 · edited May 23 '17 at 11:50

0

In your particular case, modifying the "actual" data is more efficient. You are sorting data, it is observed that its more efficient to work on sorted data rather than unsorted data. So, I don't see why you should keep the unsorted data. check out Why is it faster to process a sorted array than an unsorted array?

edited May 23 '17 at 11:50

Community

1
1

answered Sep 18 '13 at 15:36

TheLostMind

35,966
12
68
104

score 0 · Answer 5 · answered Sep 18 '13 at 15:42

0

Mutable object should be manipulated in the functions. Like Arrays#sort

But immutable objects (like String), can only return the "new" objects. Like String#replace

answered Sep 18 '13 at 15:42

Philipp Sander

10,139
6
45
78

Good practice when manipulating data in java

5 Answers5