0

My aim is to read a csv file and replace certain values corresponding to certain locations in the dataframe. Current code:

data = pd.read_csv(str(infilename), sep ='/t', header = None, skiprows = 2, index_col = None)

x = float((data.iloc[25].str.split(",", expand=True)[2]))-9000 #altering value

data.iloc[25].str.split(",", expand=True)[2]= x

print(data)
print(x)

out:

0 ------------------data------------------------
|
|
|

24      1,0.800,12161.31,8648.94,0.65,0.65,*BB,1.4061
25  2,0.883,1623669.24,656669.73,87.02,87.02,*BB,2...
26    3,1.388,143948.11,64119.45,7.71,7.71,*BB,2.2450
---------------------data------------------------
1614669.24

I'm not sure why the variable is not being replaced with the new one created?

csv data example:

screenshot of csv

Joey
  • 914
  • 4
  • 16
  • 37
  • I don't know how I can be more specific? It's fairly condensed. – Joey Dec 04 '18 at 14:17
  • Actually I was about to answer this and @Georgy is right, because I was thinking why did you took .str as this looks like a numerical data. – Shivam Kotwalia Dec 04 '18 at 14:20
  • 1
    Is there a reason why you are using `sep='\t'` instead of `sep=','`? The CSV file seems to be clearly comma-separated but you are trying to split the fields by tabs that are not even there. – Martin Frodl Dec 04 '18 at 14:23
  • yes, I get an error: pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 6, saw 5. The parse works when I use /t or /n as the sep. – Joey Dec 04 '18 at 14:26
  • Probably you should first read the [docs](http://pandas.pydata.org/pandas-docs/stable/io.html) on how to read data from CSV with pandas. – Georgy Dec 04 '18 at 14:29
  • I don't think that is the issue. I've read data from CSV files many times using pandas. Things get complicated when you create a csv from a txt file and you have multiple columns at different levels. Obviously the sep = ',' parameter should work but it doesn't for some reason. – Joey Dec 04 '18 at 14:33
  • Your `ParserError` says that there is a problem in line 6. Check if there are too many values in that row. – Georgy Dec 04 '18 at 14:37
  • 1
    Seems like your CSV has a different number of columns on different rows. If this is on purpose, see https://stackoverflow.com/questions/15242746/handling-variable-number-of-columns-with-pandas-python for instructions on how to handle such files. If this is not intentional and each row should have the same number of columns, you will need to fix your CSV file first. It would be very helpful if you sent the complete CSV file or at least several first lines, otherwise it's just wild guessing. – Martin Frodl Dec 04 '18 at 14:39
  • That is true, there are many different columns on different rows. I have added a screenshot of some example csv data. I cannot alter the structure of the csv because it will hinder any further processing on the csv that I need to do using another program (commercial/non-python). fyi, I do know how to scrub csv files to obtain certain values from certain rows etc but for this particular example, I'm trying to edit one number without changing any of the csv file structure. – Joey Dec 04 '18 at 14:47
  • OK, in that case you need to call `pd.read_csv` with the argument `names` to make clear how many columns `pandas` should expect. If your data is in columns A to H, load the data with `pd.read_csv(str(infilename), names='ABCDEFGH')`. For details, see the question I linked above. – Martin Frodl Dec 04 '18 at 15:03
  • Okay, so i managed to do it with: pd.read_csv(str(infilename), names = ['A','B','C','D','E','F','G','H'],encoding = "ISO-8859-1") .. the issue was the encoding of non utf-8 characters. Thanks for your direction. – Joey Dec 04 '18 at 15:58

0 Answers0