How to convert Excel file to CSV while retaining the cell information?

Question

Is there a way to convert excel spreadsheet containing multiple tabs into csv files (one per tab) and tag the data in csv with cell information?

For example, below row in Excel :

Data in Excel

should look like this in output csv file :

[A1]Id,[B1]Author Name

[A2]1,[B2]Agatha Christie

[A3]2,[B3]Ayn Rand

[A4]3,[B4]Arthur Conan Doyle

I am trying to use Python for this which I am fairly new to. Here I have come across many posts which talk about converting Excel to CSV using packages like xlrd, pandas etc. But I would also like to know if the cell information could be captured in some way.

Also, if there are other ways this requirement could be achieved, please advise.

If you're trying to do this in Python, why did you tag the R and Java languages? This question doesn't appear to have anything to do with either of those languages. — EJoshuaS - Stand with Ukraine, Feb 04 '20 at 16:24
In any case, to address your question, you could use Pandas and just compute the cell information - it's a very regular pattern. — EJoshuaS - Stand with Ukraine, Feb 04 '20 at 16:26
Thank for replying EJoshuaS. I am exploring all options here - Python, R, Java. Anything that helps me achieve this. My preferred way would be python though. If its achievable in Python though, I will go ahead and remove the Java and R tags. — Techblink, Feb 04 '20 at 16:30
I'm not terribly familiar with R, but I think that this would be pretty straightforward with Pandas - the pattern is regular enough that you could import it into a DataFrame, compute the cell information, edit that into each of the cells, and export it to a CSV file. — EJoshuaS - Stand with Ukraine, Feb 04 '20 at 16:32
This sounds positive. I have still to learn about data frames (practically a Python novice). But I'll let you know how it goes shortly. Thanks! — Techblink, Feb 04 '20 at 16:40

score 0 · Answer 1 · answered Feb 04 '20 at 16:33

If your data frame looks like this:

df
#>   Id        Author.Name
#> 1  1    Agatha Christie
#> 2  2           Ayn Rand
#> 3  3 Arthur Conan Doyle

Then you can do this:

df <- as.data.frame(mapply(function(x, y) paste0("[", y, 1:length(x), "] ", as.character(x)),
             df, LETTERS[seq_along(df)]))

To add the appropriate Excel cells in the format you wanted.

Output:

df
#>       Id             Author.Name
#> 1 [A1] 1    [B1] Agatha Christie
#> 2 [A2] 2           [B2] Ayn Rand
#> 3 [A3] 3 [B3] Arthur Conan Doyle

And if you write the csv with

write.csv(df, "df.csv")

Then the contents of df.csv will look like this:

"","Id","Author.Name"
"1","[A1] 1","[B1] Agatha Christie"
"2","[A2] 2","[B2] Ayn Rand"
"3","[A3] 3","[B3] Arthur Conan Doyle"

Thanks Allan. This looks straightforward. I'll try on the actual data and let you know how it goes. — Techblink, Feb 04 '20 at 16:37
@Techblink it will work as long as you have fewer than 26 columns in your data. If you have more than that it is still possible but more complex. — Allan Cameron, Feb 04 '20 at 16:39

score 0 · Accepted Answer · answered Feb 04 '20 at 20:42

0

Maybe something like this....

from xlrd import open_workbook
from xlwt import Workbook
import os

for file in os.listdir('C:\\your_path_here\\'):
    if file.endswith('.xlsx'):
        rb = open_workbook('C:\\your_path_here\\' + str(file),formatting_info=False)
        num = rb.nsheets

for a in range(num): 

   rs = rb.sheet_by_index(a)
   sheet = (rs.name)

   new_book = Workbook()
   new_sheet = new_book.add_sheet(sheet)

   for row in range(rs.nrows):
       for col in range(rs.ncols):
           new_sheet.write(row, col, rs.cell(row, col).value)

   new_book.save('C:\\your_path_here\\' + str(sheet) + '.csv')

answered Feb 04 '20 at 20:42

ASH

20,759
19
87
200

Thank you for this solution. Using parts of your logic and openpyxl module, I can now read cells from an Excel and save the cell values along with row and column information in a text file. Could you provide some insight on which of these would be most efficient in reading an Excel file with 10-12 worksheets, and about 30K cells to read in each worksheet- xlrd, openpyxl or pandas? The data from all these must be processed and saved to one text file per worksheet (as stated in the original post). – Techblink Feb 11 '20 at 10:37
Please mark it as an answer if it helped you. In terms of efficiency, I'm not sure, exactly. You can easily add a timer to any Python script and calculate the run time for any process. See this link: https://stackoverflow.com/questions/6786990/find-out-time-it-took-for-a-python-script-to-complete-execution/6791946 – ASH Feb 11 '20 at 13:43

How to convert Excel file to CSV while retaining the cell information?

2 Answers2