tl;dr
DataSource
is a way of externalizing the info needed to connect to a database server: server name or address, user name, user password, settings specific to your particular database engine, etc.
DriverManager
is fine during your initial learning. But when deploying to production, you’ll not want to hard-code connection info within your codebase. In real work, use DataSource
instead of DriverManager
to access the externalized configuration info (address, name, password, etc.).
Connection
is your live connection to the database. A DataSource
object will make use of a DriverManager
to get a Connection
object for you to use in making queries to the database.
Details
Let's look at the specifics of your Question.
What I am trying to understand is what the difference is between a Connection
and a DataSource
A Connection
object represents a live session with your database server, going back and forth to make queries and get results.
A DataSource
object holds the credentials needed to get a connection to the database. Typically, a DataSource
holds a user name recognized by the database server, a password for that user, and various settings to customize any future sessions with the database. A DataSource
is not "open" or "closed"; it merely holds the info needed to make a Connection
which is open or closed.
why it exists
Connection
exists as the conduit for conversation with a database server.
DataSource
exists as a way to avoid hard-coding connection info (user name, password, options) within your app’s code base. In real work, after you deploy your app you’ll not want to have to edit your code, recompile, and redeploy just because the DBA rotated passwords.
As a programmer, you do not want to be affected by deployment issues such as the database server’s machine network address, user names, user passwords, and such. You’ll want that info externalized outside of your codebase.
externalizing database properties such as username, password, url etc in a property file and then use DriverManager work in the same way?
No. Your code would still be hard-coded to look for that property file. But there are other ways for DBAs and SysAdmins to configure and communicate that connection info (user name, passwords, server address, etc.). The Java programmer should not make assumptions about the choices and changes to be made during deployment.
The principal way to externalize that info is to place the info within a directory server. There are many directory server implementations. These are commonly accessed via a standardized interface, such as the LDAP interface.
Java provides a facility for your Java-based app to interact with a directory service through the standardized interface. This facility is known as Java Naming and Directory Interface (JNDI).
Through JNDI, your app can ask a directory service to provide a DataSource
object with your necessary connection info. By using JNDI, your app need not make assumptions about how your DBAs/SysAdmins chose to deliver this connection info to your app. Indeed, as a programmer you need know nothing about their deployment choices and changes.
is the DataSource
interface created only to have a common way of returning connections that can be pooled etc?
The connections returned by a call to DataSource#getConnection
may or may not be part of a connection pool. As the Java programmer, you generally do not care. At deployment time, the DBAs/SysAdmins may initially deploy with non-pooled connections. Then later they may change to using pooled connections. Again, you need not care, and there is no need to edit your code, recompile, and redeploy. The DBAs can change the pooling without your involvement.
In Java EE, does the application server implement this interface and the applications deployed to have a reference to a datasource instead of a connection?
FYI, Java EE is now known as Jakarta EE, after Oracle Corp transferred responsibility to the Eclipse Foundation.
You can use JDBC and DataSource
objects in any kind of Java app: console, desktop (JavaFX/Swing/SWT), web app, microservice, etc.
By "this interface", if you mean the DataSource
interface… No, the Jakarta EE implementation such as Tomcat, Jetty, Glassfish, Payara, WildFly, JBoss, Open Liberty, does not implement DataSource
. Typically the JDBC driver provides an implementation, or your connection pool implementation does.
Again, this is configured at deployment by the DBA/SysAdmin rather than you the programmer during development. You should not bundle a JDBC driver with your Jakarta EE app. Instead configure your dependency manager (Maven, Gradle, etc.) to make a driver available transiently, only during development for your work, but not in the final artifact (.war
file etc.) for deployment.
The Jakarta EE implementation handles getting your app a DataSource
object. The implementation may itself act as the directory service; for example Tomcat can hold the connection info within its own configuration files, and then deliver that info to your app as a DataSource
object. Or the DBAs/SysAdmins may configure the Jakarta EE implementation to connect to a separate directory server implementation such as Microsoft Active Directory or OpenLDAP. Again, all these details are none of your concern as the Java programmer.
the applications deployed to have a reference to a datasource instead of a connection?
In a Jakarta EE deployment, the Jakarta EE implementation delivers a DataSource
object to your app. Your app code then calls getConnection
when needing to talk to the database server. Your app code then closes the resulting Connection
object when done talking to the database server.
Tip: Use try-with-resources syntax to automatically close connections, statements, and other JDBC resources. As mentioned above, the DataSource
object is not a resource in that sense, and is never opened or closed itself.