1

Apologies in advance if the question is poorly written. This is my second post ever onto the site and I'm a novice programmer. To start, here's what I'm aiming to do:

Step 0: Turn CSV File into record array

Step 1: Split record array into two sub-arrays

Step 2: Shuffle sub-arrays

Step 3: Split two sub-arrays into four smaller sub-arrays

Step 4: Shuffle each sub-array

Step 5: Mix and match values between sub-arrays

Step 6: Append sub-arrays to one of two record arrays and then combine record arrays into single CSV file

The first few steps have been fairly simple.

Step 0:

import numpy as np
import random
from matplotlib.mlab import csv2rec
from matplotlib.mlab import rec2csv

# Get recarray from CSV file
ev = csv2rec('stimuli_1.csv',delimiter = ';')
ev.resize(60) #for even splits

# Create lists to append data to
audio_files = np.recarray([],dtype = ev.dtype)
audio_files_1 = np.recarray([],dtype = ev.dtype)
audio_files_2 = np.recarray([],dtype = ev.dtype)

Step 1:

# Split recarray into two sub-arrays
split_1 = np.split(ev,2)

Steps 2, 3, 4, & 5:

# Shuffle sub-arrays, split again, and then shuffle again
for a in split_1:
    #Set count for mix-and-matching
    count = 0

    #Shuffle
    np.random.shuffle(a)

    #Split
    split_2 = np.split(a,2)

    for b in split_2:
        count = count+1

        #Shuffle
        np.random.shuffle(b)

        if count == 1:
            audio_files_1 = np.append(audio_files_1,b)
        elif count == 2:
            audio_files_2 = np.append(audio_files_2,b)

Step 6:

audio_files = np.append(audio_files,audio_files_1)
audio_files = np.append(audio_files,audio_files_2)

rec2csv(audio_files,'audio_files.csv')

My problem arises here. The CSV files that are produced are fine, except they have a few very weird values. For example, the first value in the 'audio' field looks like this:

\xb8\xce\xe1H\xeb\x7f\x00\x00\xd0\x12\x81

What causes this? Does it have to do with how I'm appending the arrays to each other?

Community
  • 1
  • 1

2 Answers2

0

Does your source file contain Unicode characters? Unfortunately the native CSV module in the standard library will only handle ASCII characters. Those sort of characters are what you will get when something is getting encoded for an extended charset to a lower one. There are a couple of "unicodecsv" packages out there that might help, or adapt their converters to your code (it depends on which Unicode characters you need to deal with).

For reference there is this classic article by Joel Sposky

zenWeasel
  • 1,889
  • 2
  • 22
  • 27
  • I looked through it and it appears that the original CSV file contains quotation marks, which in "stimuli_1.csv" got converted to Unicode characters. – user1456918 Feb 17 '14 at 18:39
0

Those are unicode characters. Or at least, they look like unicode.

Some good suggestions for converting them to ASCII at

I played around with

some_string = "\xb8\xce\xe1H\xeb\x7f\x00\x00\xd0\x12\x81"

for i in some_string:
    try:
        print i.decode("windows-1252")
    except:
        print i

And got some recognizable characters.

Community
  • 1
  • 1
Amanda
  • 12,099
  • 17
  • 63
  • 91