6

I am trying to convert a copy/pasted text to a csv, which I can split after. The problem is that there are whitespace tabs in it that I can't seem to get rid of

Example Copy/Paste:

Amarr Hybrid Tech Decryptor 12  Decryptors - Hybrid         12 m3
Ancient Coordinates Database    23  Sleeper Components          2.30 m3
Caldari Hybrid Tech Decryptor   17  Decryptors - Hybrid         17 m3
Carbon  17  General         34 m3
Cartesian Temporal Coordinator  4   Ancient Salvage         0.04 m3
Central System Controller   2   Ancient Salvage         0.02 m3

Now I'm trying to get something like this:

Amarr Hybrid Tech Decryptor,12,Decryptors - Hybrid,12,m3,
Ancient Coordinates Database,23,Sleeper Components,2.30,m3,
Caldari Hybrid Tech Decryptor,17,Decryptors - Hybrid,17,m3,
Carbon,17,General,34,m3,
Cartesian Temporal Coordinator,4,Ancient Salvage,0.04,m3,
Central System Controller,2,Ancient Salvage,0.02,m3,

(will always be those 5 separations per line

I have been trying to do this on various ways Split by comma and strip whitespace in Python but I can't seem to get it to work.

@login_required
def index(request):
    if request.method == "POST":
        form = SellListForm(request.POST)
        if form.is_valid():
            selllist = form.save(commit=False)
            selllist.user = request.user
            string = selllist.sell
            string = [x.strip() for x in string.split(',')] 
            print string
            return HttpResponseRedirect(reverse('processed'))
    else:
        form = SellListForm()
    return render(request, 'index.html', {'form': form})

returns

[u'<<<SULTS STUFF>>>\t\t\tVoucher\t\t\t0 m3\r\nAmarr Hybrid Tech Decryptor\t12\tDecryptors - Hybrid\t\t\t12 m3\r\nAncient Coordinates Database\t23\tSleeper Components\t\t\t2.30 m3\r\nCaldari Hybrid Tech Decryptor\t17\tDecryptors - Hybrid\t\t\t17 m3\r\nCarbon\t17\tGeneral\t\t\t34 m3\r\nCartesian Temporal Coordinator\t4\tAncient Salvage\t\t\t0.04 m3\r\nCentral System Controller\t2\tAncient Salvage\t\t\t0.02 m3']
Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Hans de Jong
  • 2,030
  • 6
  • 35
  • 55

2 Answers2

8

I see that you have several \t sometimes. I'd use the re module to split correctly:

for line in lines:
    linedata = re.split(r'\t+', line)
    print ",".join(linedata)
Maxime Lorant
  • 34,607
  • 19
  • 87
  • 97
2

You can split on tabs:

line = line.split('\t')

Unless you particularly need the comma-separated values, you can just paste your text straight into a file, open it, split on the tabs and use the data without ever introducing commas.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437