-1

I have task to convert .mdb files to .csv files. with the help of the code below I am able to read only one table file from .mdb file. I am not able to read if .mdb files contains more than one table and I want to store all the files individually. Kindly help me on this.

object mdbfiles {

    Logger.getLogger("org").setLevel(Level.ERROR)

    val spark = SparkSession.builder().appName("Positional File Reading").master("local[*]").getOrCreate()

     val sc = spark.sparkContext // Just used to create test RDDs

     def main(args: Array[String]): Unit = { 

         val inputfilepath = "C:/Users/phadpa01/Desktop/InputFiles/sample.mdb"

         val outputfilepath ="C:/Users/phadpa01/Desktop/sample_mdb_output"

         val db = DatabaseBuilder.open(new File(inputfilepath))

         try  {

             val table = db.getTable("table1");

            for ( row <- table) {
                //System.out.println(row)
                val opresult = row.values()
            }

        } 

    }

}
Phantômaxx
  • 37,901
  • 21
  • 84
  • 115
Pravinkumar Hadpad
  • 109
  • 1
  • 1
  • 10
  • Is MDB a common format? If not, you should show the file here. Also, is Spark really necessary for this task? You're only using standard Java method calls. And windows paths go like `C:\\...\\` – OneCricketeer May 29 '17 at 05:13
  • yes .MDB is common format. we need to implement this with spark scala only. – Pravinkumar Hadpad May 29 '17 at 05:16
  • Did you read the JavaDoc / ScalaDoc for that `DatabaseBuilder` class that you've neglected to mention where it comes from? – OneCricketeer May 29 '17 at 05:18
  • I assume you are using MS Access? Can you not use SparkSQL with a JDBC/ODBC driver? Why do you need to parse manually? – OneCricketeer May 29 '17 at 05:28
  • Actually my task is to convert all .mdb files to "TAB" seperator csv file. I am able to read mdb file and storing it in to one of the variable it is collection of object. to process further need to convert that objects in to dataframes. – Pravinkumar Hadpad May 29 '17 at 05:33
  • Okay, that'd be a TSV file, then – OneCricketeer May 29 '17 at 05:41

2 Answers2

1

Your problem is that you are calling only one table to be read in with this bit of code

val table = db.getTable("table1");

You should get a list of available tables in the db and then loop over them.

val tableNames = db.getTableNames

Then you can iterate over the tableNames. That should solve the issue for you in reading in more than one table. You may need to update the rest of the code to get it how you want it though.

  • while reading table data will be storing as a collection of object but I am not able to convert collection of object in to data frame kindly help me on this. if this is possible that will be great help I will sort out rest as per your solution . "val opresult = row.values() " this is collection object want to convert it in to dataframe – Pravinkumar Hadpad May 26 '17 at 11:49
1

You should really find a JDBC driver that works with MS Access rather than manually trying to parse the file yourself.

For example UCanAccess

Then, it's a simple SparkSQL command, and you have a DataFrame

val jdbcDF = spark.read
  .format("jdbc")
  .option("url", "jdbc:ucanaccess://c:/Users/phadpa01/Desktop/InputFiles/sample.mdb;memory=false")
  .option("dbtable", "table1")
  .load()

And one line to a CSV

jdbcDF.write.format("csv").save("table1.csv")

Don't forget to insert UcanAccess jars into context: ucanaccess-4.0.2.jar,jackcess-2.1.6.jar,hsqldb.jar

Alernative solution

Run a terminal command

http://ucanaccess.sourceforge.net/site.html#clients

dirceusemighini
  • 1,344
  • 2
  • 16
  • 35
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • I have tried same but getting null pointer exception error and I have added "jackcess-2.0.0.jar" and "hsqldb.jar(2.2.5)" jars in eclipse – Pravinkumar Hadpad May 29 '17 at 05:56
  • below is my code " import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark._ import org.apache.log4j._ object mdbread { Logger.getLogger("org").setLevel(Level.ERROR) val spark = SparkSession.builder().appName("Positional File Reading").master("local[*]").getOrCreate() def main(args: Array[String]): Unit = { val jdbcDF = spark.read .format("jdbc") .option("url", "jdbc:ucanaccess://c:/Users/phadpa01/Desktop/InputFiles/sample.mdb;memory=false") .option("dbtable", "table1") .load() jdbcDF.show() } }" – Pravinkumar Hadpad May 29 '17 at 05:57
  • 1) Null pointer is your own problem to solve (if it actually is a null pointer). 2) You have to add the JARs to the spark classpath, not the eclipse classpath – OneCricketeer May 29 '17 at 05:59
  • Error " [Ljava.lang.String;@1af05b03table1 ~TMPCLP400181Exception in thread "main" java.lang.NullPointerException" this kind of error I am facing first time and I am also new in scala and java – Pravinkumar Hadpad May 29 '17 at 06:05
  • I'm not going to teach you how to debug your errors or Java/Scala. If it is your "requirement" to write it in Scala, then I would consult with whomever required it of you to do that. – OneCricketeer May 29 '17 at 06:10
  • And you can't just `println` an array... https://stackoverflow.com/questions/409784/whats-the-simplest-way-to-print-a-java-array – OneCricketeer May 29 '17 at 06:10
  • Besides, `Exception in thread "main" java.lang.NullPointerException` says absolutely nothing meaningful to fix the problem. You need to read the line number of the error, then go figure out why the variable is null that it mentioned. https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it – OneCricketeer May 29 '17 at 06:12
  • Sorry for that I have referred wrong window problem is "Exception in thread "main" java.sql.SQLException: No suitable driver" and I have added jars to scala class path – Pravinkumar Hadpad May 29 '17 at 06:26
  • "No suitable driver" means you didn't add them correctly to the classpath. I don't use Eclipse, so I can't answer how you do that, but the terminal way is in the documentation https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases – OneCricketeer May 29 '17 at 06:31
  • I am able to read the .mdb files but while storing it in to .csv file showing some errors like "UCAExc:::4.0.2 incompatible data type in conversion: from SQL type CHARACTER to java.lang.Integer value: ID". but I am not converting anything manually. even I have tried to filter out the header column but no luck. ID my first column header in .mdb file. – Pravinkumar Hadpad May 29 '17 at 09:41
  • That's a problem with that jdbc driver, not your code – OneCricketeer May 29 '17 at 14:07
  • 1
    @dirceusemighini `Class.forName` is just to see if it *can be loaded*, doesn't actually load it – OneCricketeer Oct 25 '17 at 16:09
  • @cricket_007 True, I inserted here in my tests and didn't knew it wasn't necessary. I just removed it – dirceusemighini Oct 25 '17 at 16:39