0

I am trying to re-arrange a file to match a BACS bank format. In order for it to work the columns in the csv below need to be of a specific length. I have figured out the abcdabcd column as it's a repeating pattern (as are a couple more in the file), but several columns have random numbers that I cannot easily target.

Is there a way for me to target either (ideally) a specific column based on its header, or alternatively target everything up to a comma to butcher something that could work? In my example file below, you'll see three columns where the value changes. If targeting everything up to a specific character is the solution, I was thinking of using .ljust to fill the column up to the specified length (and then sorting it out manually in excel).

Original File

a,b,c,d,e,f,g,h,i,j,k
12345,1234567,0,11,123456,12345678,1234567,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,12345678,abcdabcd,A ABCD
123456,1234567,0,11,123456,12345678,12345,abcdabcd,A ABCD
12345,1234567,0,11,123456,12345678,1234567,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456789,abcdabcd,A ABCD

Ideal output

a,b,c,d,e,f,g,h,i,j,k
123450,12345670,0,11,123456,12345678,123456700,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456780,abcdabcd,A ABCD
123456,12345670,0,11,123456,12345678,123450000,abcdabcd,A ABCD
123450,12345670,0,11,123456,12345678,123456700,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456789,abcdabcd,A ABCD

Code

with open('file.txt', 'r') as file :
  filedata = file.read()

filedata = filedata.replace('12345', '12345'.ljust(6, '0'))
with open('file.txt', 'w') as file:
  file.write(filedata)

EDIT: Something similar to this Python - How to add zeros to and integer/string? but while either targeting a specific column per header, or at least the first one.

EDIT2:

I am using the below to rearrange my columns, could this be modified to work with string lengths?

import pandas as pd

## Read csv / tab-delimited in this example
df = pd.read_csv('test.txt', sep='\t')

## Reorder columns
df = df[['h','i','c','g','a','b','e','d','f','j','k']]

## Write csv / tab-delimited
df.to_csv('test', sep='\t')
gaxar
  • 35
  • 8
  • Are you just trying to pad a column to match the longest item? You could iterate through lines and keep indices for the longest matches, then iterate again to pad. – Simon Oct 22 '19 at 15:50
  • Hi Simon, no the file itself has a specific order, new columns start at character 1, 7, 15, 18, 24, 32 etc etc. What I am trying to do here is to make sure that the first column for example is always 6 chars long, so column 2 starting at char 7 has the correct info in it. There are over 1k rows in my csv, so doing so manually would take forever, but if I could target even the first column and make sure the chars there are of the correct length, I could find a way to make that work. – gaxar Oct 22 '19 at 16:03
  • A csv file is simply lines with words separated by commas. A "column" is basically all fields at a certain index on the lines. Unless your original file is formatted differently (you didn't describe it in your question), the easiest way is to iterate line by line. You can also use tools like pandas to manipulate the csv (maarten's answer). – Simon Oct 22 '19 at 18:04
  • That makes sense. The file is formatted like in my example, just lines of text and numbers separated by commas with the same piece of info contained at specific points of the string. – gaxar Oct 23 '19 at 08:14

3 Answers3

1

Using pandas, you can convert the column to str and then use .str.pad. You can make a dict with the requested lengths:

lengths = {
    "a": 6,
    "b": 8,
    "c": 3,
    "d": 6,
    "e": 8,
}

and use it like this:

result = pd.DataFrame(
    {
        column_name: column.str.pad(
            lengths.get(column_name, 0), side="right", fillchar="0"
        )
        for column_name, column in df.astype(str).items()
    }
)

If the fillchar is different per column, you can get that from a dict as well

Maarten Fabré
  • 6,938
  • 1
  • 17
  • 36
  • I think this is the closest to what I am after. I'm not sure how to implement it exactly (I am currently using pandas to move columns around, so can't be that different hopefully). Thanks for pushing me in the right direction Maarten. – gaxar Oct 23 '19 at 08:18
1
>>> print '{:0>5}'.format(4)
'00004'
>>> print '{:0<5}'.format(4)
'40000'
>>> print '{:0^5}'.format(4)
'00400'

Example:

#--------------DEFs------------------
def number_zero_right(number,len_number):
  return ('{:0<'+str(len_number)+'}').format(number)

#--------------MAIN------------------
a = 12345
b = 1234567
c = 0
d = 11
e = 123456
f = 12345678
g = 1234567
h = 'abcdabcd'
i = 'A'
j = 'ABCD'

print(a,b,c,d,e,f,g,h,i,j)
# > 12345 1234567 0 11 123456 12345678 1234567 abcdabcd A ABCD

a = number_zero_right(a,6)
b = number_zero_right(b,8)
c = number_zero_right(c,1)
d = number_zero_right(d,2)
e = number_zero_right(e,6)
f = number_zero_right(f,8)
g = number_zero_right(g,9)

print(a,b,c,d,e,f,g,h,i,j)
#> 123450 12345670 0 11 123456 12345678 123456700 abcdabcd A ABCD
R Borges
  • 197
  • 7
  • This is great, but my issue is that the numbers are not the same (should have made it clearer in my example), otherwise I would have tackled them with filedata.replace and .rjust. Could I specify a, b, c, d as 1, 2, 3, and 4th item in the string index somehow? That's what I am basically looking for (I think). – gaxar Oct 23 '19 at 08:16
0

Managed to get there, so thought I'd post in case someone has a similar issue. This only works on one column, but that's enough for me now.

#import pandas
import pandas as pd 

#open file and convert data to str  
data = pd.read_csv('Test.CSV', dtype = str) 

# width of output string 
width = 6

# fillchar
char ="_"

#Change the contents of column named ColumnID
data["ColumnID"]= data["ColumnID"].str.ljust(width, char) 

#print output  
print(data)
gaxar
  • 35
  • 8