Change CSV numerical values based on header name in file using Python

Question

I have a .csv file filled with observations from sensors in the field. The sensors write data as millimeters and I need it as meters to import into another application. My idea was to use Python and possibly pandas to: 1. Read in the .csv as dataframe 2. Find the headers of the data I need to modify (divide each actual value by 1000) 3. Divide each value in the chosen column by 1000 to convert it to meters 4. Write the resulting updated file to disk

Objective: I need to modify all the values except those with a header that contains "rad" in it.

This is what the data looks like:

Here is what I have done so far:

Read data into a dataframe:

import pandas as pd
import numpy as np
delta_df = pd.read_csv('SAAF_121581_67_500.dat',index_col=False)

Filter out all the data that I don't want to touch:
```
delta_df.filter(like='rad', axis=1)
```

Here is where I got stuck as I couldn't filter the dataframe to

not like = 'rad'

How can I do this?

Vaishali · Accepted Answer · 2017-03-09T23:55:43.870

1

Its easier if you post the dataframe rather than the image as the image is not reproducible. You can use dataframe.filter to keep all the columns containing 'rad'

delta_df = delta_df.select(lambda x: re.search('rad', x), axis=1)

Incase you are trying to remove all the columns containing 'rad', use

delta_df = delta_df.select(lambda x: not re.search('rad', x), axis=1)

Alternate solution without regex:

df.filter(like='rad',axis=1)

EDIT: Given the dataframes containing rad and not containing rad like this

df_norad = df.select(lambda x: not re.search('rad', x), axis=1)

df_rad = df.select(lambda x: re.search('rad', x), axis=1)

You can convert the values of df_norad df to meters and then merge it with df_rad

merged = pd.concat([df_norad, df_rad], axis = 1)

You can convert the dataframe merged to csv using to_csv

merged.to_csv('yourfilename.csv')

edited Mar 09 '17 at 23:55

answered Mar 09 '17 at 21:52

Vaishali

37,545
5
58
86

`delta_df = delta_df.select(lambda x: not re.search('rad', x), axis=1)` worked like a charm! Thanks for that! Update: I managed to use `delta_df.iloc[1:,1:]/1000` and this divided all the remaining data into the correct format of meters. However, as a final step I need to update the changed columns back to the original dataframe, and output it as CSV - is this possible? – coolleaf Mar 09 '17 at 22:57
@André Lubbe, I am glad it worked. See the edit for answer to the rest of the questions. – Vaishali Mar 09 '17 at 23:56
thanks for the updated solution. You have saved me so much time! Just a last question though, is there anyway to retain the original structure of the file? i.e. when finally exporting everything (the changed non-rad columns along with the divided values) is there a way to export so that the columns are in there original imported position? – coolleaf Mar 10 '17 at 17:02
You can re arrange the columns like this merged.columns = list(df.columns) – Vaishali Mar 10 '17 at 17:14

score 0 · Answer 2 · answered Mar 09 '17 at 21:52

0

Off the top of my head I believe you can do something like this:

delta_df.filter(regex='^rad', axis=1)

Where we use the regex parameter instead of the like parameter (**note regex and like are mutually exclusive).

The actual regex selects everything that does not match what follow the '^' operator.

Again, I don't have an environment set up to test this but I hope this motivates the idea well enough.

answered Mar 09 '17 at 21:52

Sam Gomena

1,450
11
21

Thank you for the suggestion, I used this in my notebook, but the `'^rad'` returns no results at all, instead of returning only the columns that don't have "rad" as part of the name. Is there a way that I could filter the headers according to a partial match to the actual header? Something like `contains` ? – coolleaf Mar 09 '17 at 22:41
This might be because the carrot operator also denotes the beginning of a sequence i.e. ^rad would match radical, radish, etc but not eradicate and so on. To get around this in regular python you put the negative lookahead (in this case 'rad') in a capture group. i.e. `'^(rad)'. However, I'm not sure how pandas implements their regex engine although I would assume it uses python's built-in library. – Sam Gomena Mar 09 '17 at 23:15
I think this works: `df.filter(regex='^((?!rad).)*$' )` via http://stackoverflow.com/questions/406230/regular-expression-to-match-a-line-that-doesnt-contain-a-word – JohnE Mar 14 '17 at 20:48

Change CSV numerical values based on header name in file using Python

2 Answers2