0

Is there a way to convert excel spreadsheet containing multiple tabs into csv files (one per tab) and tag the data in csv with cell information?

For example, below row in Excel :

Data in Excel

should look like this in output csv file :

[A1]Id,[B1]Author Name

[A2]1,[B2]Agatha Christie

[A3]2,[B3]Ayn Rand

[A4]3,[B4]Arthur Conan Doyle

I am trying to use Python for this which I am fairly new to. Here I have come across many posts which talk about converting Excel to CSV using packages like xlrd, pandas etc. But I would also like to know if the cell information could be captured in some way.

Also, if there are other ways this requirement could be achieved, please advise.

Techblink
  • 15
  • 5
  • 2
    If you're trying to do this in Python, why did you tag the R and Java languages? This question doesn't appear to have anything to do with either of those languages. – EJoshuaS - Stand with Ukraine Feb 04 '20 at 16:24
  • In any case, to address your question, you could use Pandas and just compute the cell information - it's a very regular pattern. – EJoshuaS - Stand with Ukraine Feb 04 '20 at 16:26
  • Thank for replying EJoshuaS. I am exploring all options here - Python, R, Java. Anything that helps me achieve this. My preferred way would be python though. If its achievable in Python though, I will go ahead and remove the Java and R tags. – Techblink Feb 04 '20 at 16:30
  • I'm not terribly familiar with R, but I think that this would be pretty straightforward with Pandas - the pattern is regular enough that you could import it into a DataFrame, compute the cell information, edit that into each of the cells, and export it to a CSV file. – EJoshuaS - Stand with Ukraine Feb 04 '20 at 16:32
  • This sounds positive. I have still to learn about data frames (practically a Python novice). But I'll let you know how it goes shortly. Thanks! – Techblink Feb 04 '20 at 16:40

2 Answers2

0

If your data frame looks like this:

df
#>   Id        Author.Name
#> 1  1    Agatha Christie
#> 2  2           Ayn Rand
#> 3  3 Arthur Conan Doyle

Then you can do this:

df <- as.data.frame(mapply(function(x, y) paste0("[", y, 1:length(x), "] ", as.character(x)),
             df, LETTERS[seq_along(df)]))

To add the appropriate Excel cells in the format you wanted.

Output:

df
#>       Id             Author.Name
#> 1 [A1] 1    [B1] Agatha Christie
#> 2 [A2] 2           [B2] Ayn Rand
#> 3 [A3] 3 [B3] Arthur Conan Doyle

And if you write the csv with

write.csv(df, "df.csv")

Then the contents of df.csv will look like this:

"","Id","Author.Name"
"1","[A1] 1","[B1] Agatha Christie"
"2","[A2] 2","[B2] Ayn Rand"
"3","[A3] 3","[B3] Arthur Conan Doyle"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thanks Allan. This looks straightforward. I'll try on the actual data and let you know how it goes. – Techblink Feb 04 '20 at 16:37
  • @Techblink it will work as long as you have fewer than 26 columns in your data. If you have more than that it is still possible but more complex. – Allan Cameron Feb 04 '20 at 16:39
0

Maybe something like this....

from xlrd import open_workbook
from xlwt import Workbook
import os

for file in os.listdir('C:\\your_path_here\\'):
    if file.endswith('.xlsx'):
        rb = open_workbook('C:\\your_path_here\\' + str(file),formatting_info=False)
        num = rb.nsheets

for a in range(num): 

   rs = rb.sheet_by_index(a)
   sheet = (rs.name)

   new_book = Workbook()
   new_sheet = new_book.add_sheet(sheet)

   for row in range(rs.nrows):
       for col in range(rs.ncols):
           new_sheet.write(row, col, rs.cell(row, col).value)

   new_book.save('C:\\your_path_here\\' + str(sheet) + '.csv')
ASH
  • 20,759
  • 19
  • 87
  • 200
  • Thank you for this solution. Using parts of your logic and openpyxl module, I can now read cells from an Excel and save the cell values along with row and column information in a text file. Could you provide some insight on which of these would be most efficient in reading an Excel file with 10-12 worksheets, and about 30K cells to read in each worksheet- xlrd, openpyxl or pandas? The data from all these must be processed and saved to one text file per worksheet (as stated in the original post). – Techblink Feb 11 '20 at 10:37
  • Please mark it as an answer if it helped you. In terms of efficiency, I'm not sure, exactly. You can easily add a timer to any Python script and calculate the run time for any process. See this link: https://stackoverflow.com/questions/6786990/find-out-time-it-took-for-a-python-script-to-complete-execution/6791946 – ASH Feb 11 '20 at 13:43