62

Ok so I found this really well documented node_module called js-xlsx

Question: How can I parse an xlsx to output json?

Here is what the excel sheet looks like:

enter image description here

In the end the json should look like this:

[
   {
   "id": 1,
   "Headline": "Team: Sally Pearson",
   "Location": "Austrailia",
   "BodyText": "...",
   "Media: "..."
   },
   {
   "id": 2,
   "Headline": "Team: Rebeca Andrade",
   "Location": "Brazil",
   "BodyText": "...",
   "Media: "..."
   }
]

index.js:

if(typeof require !== 'undefined') {
    console.log('hey');
    XLSX = require('xlsx');
}
var workbook = XLSX.readFile('./assets/visa.xlsx');
var sheet_name_list = workbook.SheetNames;
sheet_name_list.forEach(function(y) { /* iterate through sheets */
  var worksheet = workbook.Sheets[y];
  for (z in worksheet) {
    /* all keys that do not begin with "!" correspond to cell addresses */
    if(z[0] === '!') continue;
    // console.log(y + "!" + z + "=" + JSON.stringify(worksheet[z].v));

  }

});
XLSX.writeFile(workbook, 'out.xlsx');
Donvino
  • 2,407
  • 3
  • 25
  • 34
Armeen Moon
  • 18,061
  • 35
  • 120
  • 233
  • What are you trying to achieve? If you want simply to share Excel sheets in json structure, or just make excel sheets cloud-based for easy sharing, there are many products out there that do such things. For example https://www.ipushpull.com – Tomas Jun 25 '15 at 09:26
  • Cost money. And I know this lib will do the simple task at hand – Armeen Moon Jun 25 '15 at 14:06
  • There is trial version as well, but basically you will have to pay if you want to use something more user friendly than raw library. From your question its not that clear what you are trying to achieve or what is the problem. Are you trying to just add first row to the final json? – Tomas Jun 25 '15 at 14:18
  • Updated question for readability – Armeen Moon Jun 25 '15 at 14:21
  • 1
    Could you not save the document as CSV? This would make the parsing much simpler, the document seems to not contain any rich formatting also. – Risto Novik Jun 26 '15 at 10:01

8 Answers8

130

You can also use

var XLSX = require('xlsx');
var workbook = XLSX.readFile('Master.xlsx');
var sheet_name_list = workbook.SheetNames;
console.log(XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name_list[0]]))
aksanoble
  • 2,480
  • 4
  • 18
  • 19
  • 1
    Its doesnt return the empty values, that mean it does not add into json if the value is empty – Ayyappa A Mar 12 '18 at 08:18
  • 3
    Yeah, by default it doesn't process blank cells but by passing an optional param to XLSX.readFile function is a way around. var workbook = XLSX.readFile('Master.xlsx', {sheetStubs: true}); The sheetStubs param in the optional object allows the library to list the cells which by default are ignored by data processing utilities of the library. – freny fernandes Jun 14 '18 at 13:35
  • 11
    do note that it should be `XLSX.utils.sheet_to_json(sheet, {defval: ""})` instead of `XLSX.utils.sheet_to_json(sheet, {defVal=""})` – Stanley Jul 28 '19 at 05:14
  • Can I loop on result of ```XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name_list[0]])``` ? – Hossein Safari Dec 21 '22 at 08:13
62

Improved Version of "Josh Marinacci" answer , it will read beyond Z column (i.e. AA1).

var XLSX = require('xlsx');
var workbook = XLSX.readFile('test.xlsx');
var sheet_name_list = workbook.SheetNames;
sheet_name_list.forEach(function(y) {
    var worksheet = workbook.Sheets[y];
    var headers = {};
    var data = [];
    for(z in worksheet) {
        if(z[0] === '!') continue;
        //parse out the column, row, and value
        var tt = 0;
        for (var i = 0; i < z.length; i++) {
            if (!isNaN(z[i])) {
                tt = i;
                break;
            }
        };
        var col = z.substring(0,tt);
        var row = parseInt(z.substring(tt));
        var value = worksheet[z].v;

        //store header names
        if(row == 1 && value) {
            headers[col] = value;
            continue;
        }

        if(!data[row]) data[row]={};
        data[row][headers[col]] = value;
    }
    //drop those first two rows which are empty
    data.shift();
    data.shift();
    console.log(data);
});
Parijat
  • 814
  • 10
  • 11
  • 25
    man I hope after two years passed you dont call variables `y`, `z`, `tt` anymore, thats a huge pain to update your code – godblessstrawberry Apr 27 '18 at 14:36
  • 3
    What is even going on here? what is z? what is tt? – Storm Muller Mar 28 '19 at 15:34
  • @StormMuller see my answer it has angular solution with normal naming for variables – godblessstrawberry May 23 '19 at 21:28
  • It works, but how? Looking at naming of variables, you have to reverse engineer the process if you need to change something.. Answer below has the same code with better naming. – srokatonie Jun 25 '20 at 15:20
  • For those wondering: `y` is the sheet name (e.g. 'Sheet1'), `z` is the cell label (x/y coordinates e.g. 'A24') and `tt` is the cell label character index (for parsing a letter and not a number) – ckhatton Sep 22 '22 at 16:01
  • I would change the lines to... ```js if(!data[row - 2]) data[row - 2]={}; data[row - 2][headers[col]] = value; ``` ...to save using `data.shift()` twice – ckhatton Sep 22 '22 at 16:28
57

I think this code will do what you want. It stores the first row as a set of headers, then stores the rest in a data object which you can write to disk as JSON.

var XLSX = require('xlsx');
var workbook = XLSX.readFile('test.xlsx');
var sheet_name_list = workbook.SheetNames;
sheet_name_list.forEach(function(y) {
    var worksheet = workbook.Sheets[y];
    var headers = {};
    var data = [];
    for(z in worksheet) {
        if(z[0] === '!') continue;
        //parse out the column, row, and value
        var col = z.substring(0,1);
        var row = parseInt(z.substring(1));
        var value = worksheet[z].v;

        //store header names
        if(row == 1) {
            headers[col] = value;
            continue;
        }

        if(!data[row]) data[row]={};
        data[row][headers[col]] = value;
    }
    //drop those first two rows which are empty
    data.shift();
    data.shift();
    console.log(data);
});

prints out

[ { id: 1,
    headline: 'team: sally pearson',
    location: 'Australia',
    'body text': 'majority have…',
    media: 'http://www.youtube.com/foo' },
  { id: 2,
    headline: 'Team: rebecca',
    location: 'Brazil',
    'body text': 'it is a long established…',
    media: 'http://s2.image.foo/' } ]
Josh Marinacci
  • 1,715
  • 1
  • 14
  • 15
  • 2
    You're the man Josh! I've made a few edits to saving a file. Enjoy your bounty – Armeen Moon Jun 28 '15 at 04:31
  • I want to read a xlsx file and convert that data in to JSON. Is it possible with this? any other suggestions?? – Varun Chandran Mar 21 '18 at 05:44
  • Very, very good! Only one thing found: code misses columns for wide tables when column names start with `AA`, `AB` ... because of these lines `var col = z.substring(0,1);` and `var row = parseInt(z.substring(1));`. Changed it to `const column = z.replace(/[0-9]/g, '')` and `const row = parseInt(z.replace(/\D/g,''))` and now it works fully. – Chaki_Black Aug 01 '19 at 07:41
  • Nice, But it is skipping the blank cells in the sheets which should come as null or '' like this for me, example: Heading of K column is weight, k1 has value as 72 so it returns as weight: '72' and k3 value is 65 so weight: '65' but k2 cell is left blank so it didn't return anything, i want weight: '' or weight: null, if i got any solution it will be huge relief to me, please help me in this to solve, many thanks – Shiva May 20 '21 at 14:33
  • this code skip some columns. ;; – Logan Lee Apr 05 '22 at 00:33
5
**podria ser algo asi en react y electron**

 xslToJson = workbook => {
        //var data = [];
        var sheet_name_list = workbook.SheetNames[0];
        return XLSX.utils.sheet_to_json(workbook.Sheets[sheet_name_list], {
            raw: false,
            dateNF: "DD-MMM-YYYY",
            header:1,
            defval: ""
        });
    };

    handleFile = (file /*:File*/) => {
        /* Boilerplate to set up FileReader */
        const reader = new FileReader();
        const rABS = !!reader.readAsBinaryString;

        reader.onload = e => {
            /* Parse data */
            const bstr = e.target.result;
            const wb = XLSX.read(bstr, { type: rABS ? "binary" : "array" });
            /* Get first worksheet */
            let arr = this.xslToJson(wb);

            console.log("arr ", arr)
            var dataNueva = []

            arr.forEach(data => {
                console.log("data renaes ", data)
            })
            // this.setState({ DataEESSsend: dataNueva })
            console.log("dataNueva ", dataNueva)

        };


        if (rABS) reader.readAsBinaryString(file);
        else reader.readAsArrayBuffer(file);
    };

    handleChange = e => {
        const files = e.target.files;
        if (files && files[0]) {
            this.handleFile(files[0]);
        }
    };
AN German
  • 725
  • 10
  • 10
  • 1
    Welcome to Stackoverflow! :D If you comment your code or add some content to your body to clarify your answer it would make your answer better and easy to understand. – Swetank Poddar Apr 19 '20 at 07:35
  • @AN German you saved my life, today prod deployment and i fixed it with this code, thank you very much – Arunsai B K Sep 24 '21 at 16:51
2

here's angular 5 method version of this with unminified syntax for those who struggling with that y, z, tt in accepted answer. usage: parseXlsx().subscribe((data)=> {...})

parseXlsx() {
    let self = this;
    return Observable.create(observer => {
        this.http.get('./assets/input.xlsx', { responseType: 'arraybuffer' }).subscribe((data: ArrayBuffer) => {
            const XLSX = require('xlsx');
            let file = new Uint8Array(data);
            let workbook = XLSX.read(file, { type: 'array' });
            let sheetNamesList = workbook.SheetNames;

            let allLists = {};
            sheetNamesList.forEach(function (sheetName) {
                let worksheet = workbook.Sheets[sheetName];
                let currentWorksheetHeaders: object = {};
                let data: Array<any> = [];
                for (let cellName in worksheet) {//cellNames example: !ref,!margins,A1,B1,C1

                    //skipping serviceCells !margins,!ref
                    if (cellName[0] === '!') {
                        continue
                    };

                    //parse colName, rowNumber, and getting cellValue
                    let numberPosition = self.getCellNumberPosition(cellName);
                    let colName = cellName.substring(0, numberPosition);
                    let rowNumber = parseInt(cellName.substring(numberPosition));
                    let cellValue = worksheet[cellName].w;// .w is XLSX property of parsed worksheet

                    //treating '-' cells as empty on Spot Indices worksheet
                    if (cellValue.trim() == "-") {
                        continue;
                    }

                    //storing header column names
                    if (rowNumber == 1 && cellValue) {
                        currentWorksheetHeaders[colName] = typeof (cellValue) == "string" ? cellValue.toCamelCase() : cellValue;
                        continue;
                    }

                    //creating empty object placeholder to store current row
                    if (!data[rowNumber]) {
                        data[rowNumber] = {}
                    };

                    //if header is date - for spot indices headers are dates
                    data[rowNumber][currentWorksheetHeaders[colName]] = cellValue;

                }

                //dropping first two empty rows
                data.shift();
                data.shift();
                allLists[sheetName.toCamelCase()] = data;
            });

            this.parsed = allLists;

            observer.next(allLists);
            observer.complete();
        })
    });
}
godblessstrawberry
  • 4,556
  • 2
  • 40
  • 58
1

Just improving @parijat answer a little.

var XLSX = require('xlsx');
    var workbook = XLSX.readFile('test.xlsx');
    var sheet_name_list = workbook.SheetNames;
    sheet_name_list.forEach(function(y) {
        var worksheet = workbook.Sheets[y];
        var headers = {};
        var data = [];
        for(z in worksheet) {
            if(z[0] === '!') continue;
            //parse out the column, row, and value
            var tt = 0;
            for (var i = 0; i < z.length; i++) {
                if (!isNaN(z[i])) {
                    tt = i;
                    break;
                }
            };
            var col = z.substring(0,tt);
            var row = parseInt(z.substring(tt));
            var value = worksheet[z].v;
    
            //store header names
            if(row == 1 && value) {
                headers[col] = value;
                continue;
            }
    
            if(!data[row-2]) data[row-2]={};
            data[row-2][headers[col]] = value;
        }
        //Now no need to drop the data element
        console.log(data);
    });
kaushik_pm
  • 295
  • 3
  • 10
0

I found a better way of doing this

  function genrateJSONEngine() {
    var XLSX = require('xlsx');
    var workbook = XLSX.readFile('test.xlsx');
    var sheet_name_list = workbook.SheetNames;
    sheet_name_list.forEach(function (y) {
      var array = workbook.Sheets[y];

      var first = array[0].join()
      var headers = first.split(',');

      var jsonData = [];
      for (var i = 1, length = array.length; i < length; i++) {

        var myRow = array[i].join();
        var row = myRow.split(',');

        var data = {};
        for (var x = 0; x < row.length; x++) {
          data[headers[x]] = row[x];
        }
        jsonData.push(data);

      }
mad Man
  • 366
  • 3
  • 7
0

Here is my solution (in typescript) using some of ramdas helpers. It supports multiple sheets and returns an object with the key as the sheet name.

const parseXLSX = (file: File, cb: callback) => {
  const reader = new FileReader()
  const rABS = !!reader.readAsBinaryString

  reader.onload = (e) => {
    const bstr = e?.target?.result
    const wb = XLSX.read(bstr, { type: rABS ? 'binary' : 'array' })
    const sheetNames = wb.SheetNames
    const sheetsData = sheetNames.reduce((acc, sheetName) => {
      const worksheet = wb.Sheets[sheetName]
      const headers: Record<string, string> = {}
      const data: Record<string, unknown>[] = []

      keys(worksheet).forEach((key) => {
        // removes !ref column
        if (String(key)?.[0] !== '!') {
          // supports wide tables ex: AA1
          const column = String(key).replace(/[0-9]/g, '')
          const row = parseInt(String(key).replace(/\D/g, ''), 10)
          const value = worksheet[key].v

          if (row === 1) {
            headers[column] = value
          }

          // this solution does not support when header is not first row
          if (headers[column] !== undefined) {
            if (!data[row]) {
              data[row] = {}
            }

            data[row][headers[column]] = value
          }
        }
      })

      return {
        ...acc,
        [sheetName]: drop(1, data.filter(Boolean)),
      }
    }, {})

    cb(sheetsData)
  }
  if (rABS) reader.readAsBinaryString(file)
  else reader.readAsArrayBuffer(file)
}

ricopella
  • 63
  • 7