Reformat wide Excel table to more SQL-friendly structure

Question

I have a very wide Excel sheet, from Column A - DIE (about 2500 columns wide), of survey data. Each column is a question, and each row is a response. I'm trying to upload the data to SQL and convert it to a more SQL-friendly format using the UNPIVOT function, but I can't even get it loaded into SQL because it exceeds the 1024-column limit.

Basically, I have an Excel sheet that looks like this:

But I want to convert it to look like this:

What options do I have to make this change, either in Excel (prior to upload) or SQL (while circumventing the 1024 column limit)?

If I do that, I'll end up with far more than the 1 million row limit for Excel — happyhippo83, Jul 31 '17 at 19:10
You are going to have to do this in batches. You have created a challenge for yourself to say the least with 2,500 columns in a denormalized structure. — Sean Lange, Jul 31 '17 at 19:12
I wish it was a self-created challenge. I've inherited some poorly-designed survey data and have to get it into SQL for future use. Also there's about 10 years' worth of this, with 1 of these for each year :( — happyhippo83, Jul 31 '17 at 19:13
https://stackoverflow.com/questions/36365839/excel-macrovba-to-transpose-multiple-columns-to-multiple-rows/36366394#36366394 but write the final array to file instead of placing it on a worksheet — Tim Williams, Jul 31 '17 at 22:13

score 2 · Accepted Answer · answered Jul 31 '17 at 19:20

2

I have had to do this quite a bit. My solution was to write a Python script that would un-crosstab a CSV file (typically exported from Excel), creating another CSV file. The Python code is here: https://pypi.python.org/pypi/un-xtab/ and the documentation is here: http://pythonhosted.org/un-xtab/. I've never run it on a file with 2500 columns, but don't know why it wouldn't work.

answered Jul 31 '17 at 19:20

rd_nielsen

2,407
2
11
18

This looks great. Let me try it out. Thanks :) – happyhippo83 Jul 31 '17 at 19:22

BigTimeStats · Answer 2 · 2017-07-31T20:10:23.207

R has a very specific function call in one of it's libraries. You can also connect, read, and write data with R into a database. Would suggest downloading R and Rstudio.

Here is a working script to get you started that does what you need:

Sample data:

df <- data.frame(id = c(1,2,3), question_1 = c(1,0,1), question_2 = c(2,0,2))
df

Input table:

  id question_1 question_2
1  1          1          2
2  2          0          0
3  3          1          2

Code to transpose the data:

df2 <- gather(df, key = id, value = values)
df2

Output:

   id        id values
1  1 question_1      1
2  2 question_1      0
3  3 question_1      1
4  1 question_2      2
5  2 question_2      0
6  3 question_2      2

Some helper functions for you to import and export the csv data:

# Install and load the necessary libraries
install.packages(c('tidyr','readr'))
library(tidyr)
library(readr)

# to read a csv file
df <- read_csv('[some directory][some filename].csv')

# To output the csv file
write.csv(df2, '[some directory]data.csv', row.names = FALSE)

score 0 · Answer 3 · answered Aug 01 '17 at 14:14

Thanks for all the help. I ended up using Python due to limitations in both SQL (over 1024 columns wide) and Excel (well over 1 million rows in the output). I borrowed the concepts from rd_nielson's code, but that was a bit more complicated than I needed. In case it's helpful to anyone else, this is the code I used. It outputs a csv file with 3 columns and 14 million rows that I can upload to SQL.

import csv

with open('Responses.csv') as f:
    reader = csv.reader(f)
    headers = next(reader)  # capture current field headers
    newHeaders = ['ResponseID','Question','Response']  # establish new header names
    with open('PythonOut.csv','w') as outputfile:
        writer=csv.writer(outputfile, dialect='excel', lineterminator='\n')  
        writer.writerow(newHeaders)  # write new headers to output

        QuestionHeaders = headers[1:len(headers)]  # Slice the question headers from original header list
        for row in reader:
            questionCount = 0  # start counter to loop through each question (column) for every response (row)
            while questionCount <= len(QuestionHeaders) - 1:  
                newRow = [row[0], QuestionHeaders[questionCount], row[questionCount + 1]]  
                writer.writerow(newRow)
                questionCount += 1

Reformat wide Excel table to more SQL-friendly structure

3 Answers3