-1

I need some help with this code:

import pyexcel as pe
import pyexcel.ext.xls
import pyexcel.ext.xlsx

def randStrings(excelFile1,excelFile2):
    getNameList = pe.get_sheet(file_name=excelFile1)
    randName = sum([i for i in getNameList],[])
    getCompanyList = pe.get_sheet(file_name=excelFile2)
    randCompany = sum([i for i in getCompanyList],[])
    randStrings.name = random.choice(randName)
    randStrings.combinedString = random.choice(randName) + "@" + random.choice(randCompany).lower().replace(" ","").replace("'","").replace("/","").replace(".","").replace(",","") +".com"
    return randStrings.name, randStrings.combinedString

randStrings("names.xlsx","companys.xlsx")
data = {'user_name':randStrings.name,'user_email': randStrings.combinedString}
print data

my output is : {'user_name': u'duky', 'user_email': u'geri@belleladi.com'}

Need help or advice for two things :

1.Does anyone have a idea or can explain on why there is 'u' character when getting a record from the excel sheet?? and how to remove this from the output?

  1. As you can I've done a extra long .replace() to get rid of any gremlins within the excel . Is there a short or more clean way to do this? like a python reg ex or something. I haven't found any examples dealing with multiple replacements for formating.

Cheers

johnny R
  • 9
  • 2

4 Answers4

0

The u' in front of your records means that they are encode as unicode.

>>> a=u'test'
>>> a
u'test'
>>> type(a)
<type 'unicode'>
>>> b="b"
>>> b
'b'
>>> type(b)
<type 'str'>
>>> 
Tim Stopfer
  • 116
  • 8
0

1.

The u means that the string is Unicode.

2.

You can use string.translate

import string
characters_to_remove = " '/.,"
string.translate(random.choice(randCompany).lower(), None, characters_to_remove)
Community
  • 1
  • 1
Christian Witts
  • 11,375
  • 1
  • 33
  • 46
  • Hey Chris , thanks your example. however i get a error of "AttributeError: 'list' object has no attribute 'lower'" ive placed the example first between the random.choice() and second just after where "randCompany" gets set. But i think string.translate is simliar to REReplace in coldfusion as when i did it in coldfusion all i needed to do REReplace((Lcase(#companyList("companys.xlsx")#)), " |'| ", '', "ALL")> and all formated :D. maybe you can show where i need to put this within the python code? – johnny R Apr 06 '16 at 12:42
  • Sorry about that, forgot the `random.choice()` part for randCompany. :) – Christian Witts Apr 06 '16 at 15:58
0

Your encoding is Unicode, which is not the native encoding used by Excel. That is why you get "u" before your strings.

Have you tried the following in Excel : Data -> import your data -> And specifying the encoding (probably utf8) in the import wizard ?

Nahid O.
  • 171
  • 1
  • 3
  • 14
  • No i haven't and not sure how to in excel 2013, so does this mean i have to find a way convert the data to a different encoding or something? is there a way to to convert within python instead? – johnny R Apr 06 '16 at 12:41
  • "Unicode" is not an encoding. Unicode is rendered in specific encodings (such as UTF-8 or UTF-16) on external media. – holdenweb Apr 06 '16 at 12:50
  • @johnny R : It might be easier to do it in Excel. However, if you want to convert it in Python, you might wanna look a the codec module : https://docs.python.org/2/library/codecs.html. This is a complex subject, I recommend taking a look at the Unicode doc. For me, this video worked well : https://www.youtube.com/watch?v=sgHbC6udIqc – Nahid O. Apr 06 '16 at 12:59
0

There is another issue with your code that the other answers do not address.

You are using a rather unusual technique to "return" the data from your function. Although you do have a return statement, your function call doesn't actually save the returned objects, so your return statement is redundant.

You are attaching the data to the function object itself as function attributes randStrings.name and randStrings.combinedString. Sure, this works, but as I said, it is rather unusual, and it is rare to use function attributes, and when they are used it is to do special things, like saving values between function calls.

The normal way to pass data back from a function is like this:

def randStrings(excelFile1,excelFile2):
    getNameList = pe.get_sheet(file_name=excelFile1)
    randName = sum([i for i in getNameList],[])
    getCompanyList = pe.get_sheet(file_name=excelFile2)
    randCompany = sum([i for i in getCompanyList],[])
    name = random.choice(randName)
    combinedString = random.choice(randName) + "@" + random.choice(randCompany).lower().replace(" ","").replace("'","").replace("/","").replace(".","").replace(",","") +".com"
    return name, combinedString

name, combinedString = randStrings("names.xlsx","companys.xlsx")
data = {'user_name':name,'user_email': combinedString}
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182