4

I have a .tsv file with some fields being ranges like 1 - 4. I want to read these fields as they are textually written. However, upon file opening excel converts automatically those range fields to dates. For instance 1 - 4 is converted to 4-Jan. If I try to format back the cell to another type, the value is already changed and I can only get a useless number (39816). Even if the range fields are within double quotes, the wrong conversion to date still takes place. How to avoid this behavior?

juanmirocks
  • 4,786
  • 5
  • 46
  • 46
  • 2
    Don't use Excel. ;) Seriously though, creating actual .xls files with "typed" cells may be the only way. – deceze Apr 10 '13 at 09:25
  • 2
    I just found out, my question is a duplicate of: http://stackoverflow.com/questions/165042/stop-excel-from-automatically-converting-certain-text-values-to-dates?rq=1 -- That solves the issue (though I have to rewrite the fields with special double quotes. Not using Excel may be a better solution :P) – juanmirocks Apr 10 '13 at 09:30
  • 1
    Oh god, Excel...! D: Good to know this "solution" though. – deceze Apr 10 '13 at 09:32
  • 1
    Actually, I'm just using Excel to easily remove some columns. Could you recommend me another plain-simple tsv/csv reader that could do this? – juanmirocks Apr 10 '13 at 09:35
  • 1
    Whenever I need to deal with something like this, I usually resort to [Numbers](http://www.apple.com/iwork/numbers/), which I happen to have installed and which is basically Excel Which Doesn't Suck™. Not really a quick and easy option though. :3 – deceze Apr 10 '13 at 09:37
  • 1
    If you need to use Excel, you can open/import the data with Query Tables method, which allows you to specify certain columns (like these) should be interpreted as STRING type. http://stackoverflow.com/a/15665605/1467082 – David Zemens Apr 10 '13 at 12:24

5 Answers5

3

I think you best use the import facility in excel but you may have to manually change the file extension to a csv.

When importing be sure to select text for all the columns with these values.

glh
  • 4,900
  • 3
  • 23
  • 40
0

My question is in fact a duplicate of at least:

1) Stop Excel from automatically converting certain text values to dates

2) Excel: Default to TEXT rather than GENERAL when opening a .csv file

The possible solutions for Excel are to 1) either writing the fields with special double quotes like "May 16, 2011" as "=""May 16, 2011""" or 2) importing the csv/tsv file with the external data wizard and then selecting manually which columns you want to read as TEXT and not GENERAL (which could convert fields to dates)

As for my use case, I was only using Excel to remove some columns. None of the solutions was appealing to me because I wouldn't like to rewrite the tsv files with special quotes and because I had hundreds of columns and I didn't want to select each manually to be read as TEXT.

Therefore I wrote a scala script to filter tsv files by column names:

package com.jmcejuela.ml

import java.io.InputStream
import java.io.Writer

import scala.io.Codec
import scala.io.Source

import Table._

/**
 * Class to represent tables with a fixed size of columns. All rows have the same columns.
 */
class Table(val rows: Seq[Row]) {
  lazy val numDiffColumns = rows.foldLeft(Set[Int]())((set, row) => set + row.size)

  def toTSV(out: Writer) {
    if (rows.isEmpty) out.write(TableEmpty.toString)
    else {
      out.write(writeLineTSV(rows.head.map(_.name))) //header
      rows.foreach(r => out.write(writeLineTSV(r.map(_.value))))
      out.close
    }
  }

  /**
   * Get a Table with only the given columns.
   */
  def filterColumnsByName(columnNames: Set[String]): Table = {
    val existingNames = rows.head.map(_.name).toSet
    assert(columnNames.forall(n => existingNames.contains(n)), "You want to include column names that do not exist")
    new Table(rows.map { row => row.filter(col => columnNames.contains(col.name)) })
  }

}

object TableEmpty extends Table(Seq.empty) {
  override def toString = "Table(Empty)"
}

object Table {
  def apply(rows: Row*) = new Table(rows)

  type Row = Array[Column]

  /**
   * Column representation. Note that each column has a name and a value. Since the class Table
   * is a sequence of rows which are a size-fixed array of columns, the name field is redundant
   * for Table. However, this column representation could be used in the future to support
   * schemata-less tables.
   */
  case class Column(name: String, value: String)

  private def parseLineTSV(line: String) = line.split("\t")
  private def writeLineTSV(line: Seq[String]) = line.mkString("", "\t", "\n")

  /**
   * It is assumed that the first row gives the names to the columns
   */
  def fromTSV(in: InputStream)(implicit encoding: Codec = Codec.UTF8): Table = {
    val linesIt = Source.fromInputStream(in).getLines
    if (linesIt.isEmpty) TableEmpty
    else {
      val columnNames = parseLineTSV(linesIt.next)
      val padding = {
        //add padding of empty columns-fields to lines that do not include last fields because they are empty
        def infinite[A](x: A): Stream[A] = x #:: infinite(x)
        infinite("")
      }
      val rows = linesIt.map { line =>
        ((0 until columnNames.size).zip(parseLineTSV(line) ++: padding).map { case (index, field) => Column(columnNames(index), field) }).toArray
      }.toStream
      new Table(rows)
    }
  }
}
Community
  • 1
  • 1
juanmirocks
  • 4,786
  • 5
  • 46
  • 46
0

Write 01-04 instead of 1-4 in excel..

Salman
  • 1
0

I had a "text" formatted cell in excel being populated with a chemical casn with the value "8013-07-8" that was being reformatted into a date format. To remedy the problem, I concatenated a single quote to the beginning of the value and it rendered correctly when viewing the results. When you click on the cell, you see the prefixed single-quote, but at least I stopped seeing it as a date.

0

In my case, When I typed 5-14 in my D2 excel cell, is coverts to date 14 May. With a help from somebody , I was able to change the date format to the number range (5-14) using the following approach and wanted to share it with you. (I will use my case an example).

  1. Using cell format in excel, I converted the date format in D2 (14 May) to number first ( in my case it gave me 43599).
  2. then used the formula below ,in excel, to convert it 5-14. =IF (EXACT (D2, 43599), "5-14", D2).