-1

I'm trying to get an ArrayList (or a Set, or anything similar) of the artists of a List of Songs. Each song has the function getArtists which returns an Array of every artist who is participating in the song.
The goal is to have a List of Artists and every Artist should have a List (or Set, whichever is faster) which contains all the Songs where he participates.

My Code works, but it's rather slow (it needs 5 sec for 1600 songs). How can i speed it up?

My Code

private ArrayList<Artist> getArtistsFromSongs(List<Song> songs)
{
    long start = System.currentTimeMillis();

    ArrayList<Artist> artists = new ArrayList<>();
    for (Song song : songs)
    {
        String[] artistsStringArray = song.getArtists();
        for (String artistString : artistsStringArray)
        {
            boolean artistAlreadyExists = false;
            int heExistsAt = -1;

            for (int i = 0; i < artists.size(); i++)
            {
                if (artists.get(i).name.equals(artistString))
                {
                    artistAlreadyExists = true;
                    heExistsAt = i;
                }
            }
            if (artistAlreadyExists)
            {
                artists.get(heExistsAt).songs.add(song);
            } else
            {
                Artist newArtist = new Artist(artistString, new ArrayList<>());
                newArtist.songs.add(song);
                artists.add(newArtist);
            }
        }
    }
    long test = System.currentTimeMillis() - start; //~5500 milliseconds
    return artists;
}

The Class

class Artist
{
    public final String name;
    public final ArrayList<Song> songs;

    Artist(String name, ArrayList<Song> songs)
    {
        this.name = name;
        this.songs = songs;
    }
}

Thanks in advance.

  • Possible duplicate of [Is there a performance difference between a for loop and a for-each loop?](https://stackoverflow.com/questions/256859/is-there-a-performance-difference-between-a-for-loop-and-a-for-each-loop) – Logan Jun 06 '18 at 22:00
  • Perfomance optimization **must always** come with a detailed analysis. So, measure it! Also, provide a full example. I don't think that your iteration itself is slow. So there's probably something inside the code which is *slow*. We can't tell by just looking at the code snippet. Get a framework and analyze in which code section you spend the most time. – Zabuzard Jun 06 '18 at 22:00
  • Note that a *regular for*-loop is slightly faster for arrays than an *enhanced for*-loop. The generated bytecode is very similar though, the difference is super small. – Zabuzard Jun 06 '18 at 22:01
  • @Zabuza This iteration is definelty the source of the "slowness". I don't know how to give you more details.. everything else is tested and runs in under 1 sec, whereas this specific part takes around 5.5 seconds. Look updated question. – Daniel Kargl Jun 06 '18 at 22:12
  • are objects song and artists simple objects or are they something like jpa/hibernate entities – gagan singh Jun 06 '18 at 22:22
  • @gagansingh Artist is exactly like in the question, Song is pretty similar but with a bunch of methods static functions. Nothing special at all. – Daniel Kargl Jun 06 '18 at 22:26
  • I mean, it's not the loop itself that is slow. It's something inside the loop which runs slow. Get a framework to analyze in which line you spend how many time. We can't analyze it for you since you didn't provide a [mcve] and it's not that you have an obvious slow call in your code. Everything looks fine at first glance. We can't tell with more input. – Zabuzard Jun 06 '18 at 22:39
  • @Zabuza Just look at the code. It's an obvious sequential search by name in the innermost loop. No tool needed to analyze that!!! – Andreas Jun 06 '18 at 22:44
  • @Andreas Still, wheres the problem? What's the input size? The time was probably to be expected and doesn't indicate a bug in the code or an obvious bad call. I'm not saying that the question can't be answered. I'm saying that it's not asked well. – Zabuzard Jun 06 '18 at 22:59
  • @Zabuza Even if the input was extreamly huge and the time was to be expected, that is not what i asked for. I asked if this code could genereally be improved, which, as shown by Andreas's answer, it could. – Daniel Kargl Jun 06 '18 at 23:02
  • @Zabuza It's asked perfectly well. *"It's rather slow, how can i speed it up?"* And I disagree. A sequence search *is* an obvious "bad call". – Andreas Jun 06 '18 at 23:42

1 Answers1

0

To improve performance, change artists from ArrayList<Artist> to HashMap<String, Artist>.

That way you can replace the innermost sequential search loop with a fast and simple map lookup.

Map<String, Artist> artists = new HashMap<>();
for (Song song : songs) {
    for (String artistName : song.getArtists()) {
        Artist artist = artists.get(artistName);
        if (artist == null) {
            artist = new Artist(artistName, new ArrayList<>());
            artists.put(artistName, artist);
        }
        artist.Songs.add(song);
    }
}

In Java 8+ that can be improved, especially if you change the Artist class a little.

Map<String, Artist> artists = new HashMap<>();
for (Song song : songs) {
    for (String artistName : song.getArtists()) {
        artists.computeIfAbsent(artistName, Artist::new).addSong(song);
    }
}
class Artist {
    public final String name;
    public final ArrayList<Song> songs = new ArrayList<>();

    Artist(String name) {
        this.name = name;
    }

    void addSong(Song song) {
        this.songs.add(song);
    }
}

Note: Java naming convention is for field names to start with lowercase letter, so second solution above was changed to reflect that.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • much cleaner code and definitely faster but I still do not understand what was taking 5.5 seconds. How big was the list of songs? millions, billions – gagan singh Jun 06 '18 at 22:34
  • @gagansingh I also don't understeand what was taking so long, but i DO know that this code does the exact same thing in below 50ms, while also looking a lot cleaner. By the way: the list contains about 1600 songs. – Daniel Kargl Jun 06 '18 at 22:52
  • it is very clear that in the modified code you are searching a map which is already indexed where as in the other case you were searching a full list every time, hence it was an order of magnitude costlier. I did not expect that to be this costly but you always learn new stuff :) – gagan singh Jun 06 '18 at 22:55
  • I'm relatively new to programming and never even used a map before.. so yep, i learned a lot from this answer aswell :) – Daniel Kargl Jun 06 '18 at 23:04