0

I am trying to write some csv data but I keep getting the escape sequence key right after every word in the csv file.

setup:

with open('gibber.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile, delimiter=",", quoting=csv.QUOTE_NONE, escapechar=" ")
    for values in izip_longest(*csv_data, fillvalue="-,-"):
        writer.writerow([unicode(s).encode("utf-8") for s in values])
csvfile.close()

if I print out the writer.writerow(...) as above, the line below is a sampe.

['dipey,1', 'you have,2', 'at the beginning,1', 'brilliant charles brown truly,1', 'great the first also was,1', 'identical to this one as far,1', 'be when pie mood mark lake a,1', 'shardely uptown is you free on a stone,1', 'let it rest and sun it those it super,1']

I've tried many things like this and pretty much every thing I can search for about why the csv writer is placing the escape sequence after every word?

my desired output should be something like this

--------------------------------------------------------------------------
word1 | word_count1 |    word2    | word_count2 | .. wordN | word_countN
--------------------------------------------------------------------------
 word |      3      | word word   |     7       |  .............. N

but instead I am getting something like this

[] = escapecharacter
--------------------------------------------------------------------------
 word1 | word_count1 |    word2    | word_count2 | .. wordN | word_countN
--------------------------------------------------------------------------
 word[]|      3      |word[] word[]|    7       |  .............. N

using a blank space as my escapechar then I get an extra space after every word. Using tabs or newlines will break the row/column layout. Using any single letter, number or even \ will put that escapechar at the right most spot of any row item but then the double spaces will be gone.

the sample list that I posted above is an example of the list that I pass to writer.writerow(...)

test data

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8")

data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Many Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8")

data2 = unicode("Tropical forests cover many a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8")

sample csv_data see full data here import pprint pp = pprint.PrettyPrinter(indent=4) pp.pprint(csv_data)

[   [   u'shrubs,1',
        u'chile,1',
        u'equatorial,1',
        u'china,1',
        u'may,1',
        u'zone7,1'],
    [   u'washington oregon,1',
        u'new zealand10,1',
        u'moist forests,1',
        u'biotic species,1',
        u'and tropic,1',
        u'term jungle,1',
        u'sometimes misnamed,1',
        u'japan and,1',
        u'the world,1',
        u'200 cm,1',
        u'between the,1',
        u'canopy is,1',
        u'as hawaii,1',
        u'and temperate,1',
        u'many australia,1',
        u'but temperate,1'],
    [   u'cancer and tropic,1',
        u'black sea including,1',
        u'asia in southern,1',
        u'some areas of,1',
        u'also known as,1',
        u'as well as,1',
        u'areas of a,1',
        u'central america eg,1',
        u'250 and 450,1'],
    [   u'rainforest the monsoon trough,1',
        u'shrubs and small trees,1',u'dense tangled growth of,1',
        u'of the british isles,1'],
    [   u'sometimes misnamed oxygen production4 processing,1',
        u'a significant role in creating,1',
        and,1',
        u'are also responsible for 28 of the worlds oxygen,1',
        u'the climatic conditions necessary for the earths tropical rainforests,1',
        u'growth of vines shrubs and small trees called a,1',
        u'columbia washington oregon and california in europe parts of,1']]

you can see from the sample data above, then I izip the csv_data to transpose it, and write out each row.

edit

This is how I am writing the data that I want to be in a row.

    csv_data = []
    for index, item in enumerate(package.count_set[0]):
        payload = []
        phrase = item[0]
        for pindex, pitem in enumerate(phrase): #pitem is a Counter
            # print(index, pindex, " ".join(pitem), phrase[pitem])
            _str = " ".join(pitem)
            _cnt = phrase[pitem]
            _data = _str+",%d"%(_cnt)
            payload.append(_data)
        csv_data.append(payload)

so i create a list of items like this [ "word,count,", "word1,count1,", "word2,count2,", "wordN,countN," ]

I've also tried without the trailing comma [ "word,count", "word1,count1", "word2,count2", "wordN,countN" ]

is it the way that I am creating this list payload then appending it to the csv_data list the problem?

Community
  • 1
  • 1
user1610950
  • 1,837
  • 5
  • 33
  • 49
  • 1
    What is some sample input? What is the expected output? What is the actual output? – Mark Tolonen Nov 25 '16 at 01:11
  • @MarkTolonen I made an edit with some more info. Why does space escapechar put a space to the right of every word, making the output double spaced, ex: instead of "hello word" it will write "hello word " – user1610950 Nov 25 '16 at 01:30
  • It seems to be a syntax error in your code (`writer.writerow([unicode(s).encode("utf-8") for s in values])values])`). Also, please, provide your input (particularly, what is `csvdata`?) – Ilya V. Schurov Nov 25 '16 at 02:32
  • @IlyaV.Schurov I've added the test data, the sample csv output, it's too long to post in full here but there's a pastie. Trying to write that data out to a csv, I get the errors I mentioned earlier. writer.writerow([unicode(s).encode("utf-8") for s in values]) – user1610950 Nov 25 '16 at 02:58
  • Okay, so you have a list of lists of strings in `csv_data`. Every string contains some words and a number, delimeted from each other by a comma. The number is *word count*. You would like to create a CSV file such that every row in the CSV file correspond to one element of `csv_data` and contains twice as much cells as the corresponding element of `csv_data`: first you want words, then you want *word count* and so on. Am I right? – Ilya V. Schurov Nov 25 '16 at 03:12
  • Why exactly you are unhappy with current behaviour of csvwriter? *why the csv writer is placing the escape sequence after every word?* — most probably it places the escape sequence to escape characters like comma or space. As your strings have commas and spaces, they have to be escaped in order to be placed to cells of CSV file. Do you really want to split your strings by a comma and place them into different cells of CSV table? – Ilya V. Schurov Nov 25 '16 at 03:14
  • @IlyaV.Schurov I made an edit to the question to show how append the data. Could that be the cause of the data writing out with the extra escapechar? – user1610950 Nov 25 '16 at 10:03

1 Answers1

0

I don't like to typically answer my own question but I solved the issue by just building the string myself and writing to a file.

_range = files_to_load + 1
with open('data.csv', 'wb') as csvfile:
    header = (["%d word phrase, phrase count"%(i) for i in range(1, _range)])

    header_line = ""
    for index, item in enumerate(header):
        word, count = item.split(",")
        if int(word[0]) <= 1:
            pass
        else:
            word = word.replace("phrase", "phrases")

        header_line += word+","+count+","
    header_line = header_line[:-1]
    header_line += "\n"
    csvfile.write(header_line)

    for values in izip_longest(*csv_data, fillvalue="-,0"):
        line_list = ([unicode(s).encode("utf-8") for s in values])
        line_str = ""
        for item in line_list:
            word, count = item.split(",")
            line_str += word+","+count+","
        line_str = line_str[:-1]+"\n"

        csvfile.write(line_str)
csvfile.close()

The above code could probably be cleaned up a lot but no matter what I did, I couldn't get the python csv module to work correctly with my data.

This was most likely user error and some oversight on my part but still. The above code writes out what I need in a csv format without any weird artifacts.

user1610950
  • 1,837
  • 5
  • 33
  • 49