Replacing values with Pandas Dataframes - Massive values

Question

I want to replace a csv column with more than 8 hundred thousand rows but I can't do it with the following code:

import pandas as pd
import numpy as np
df = pd.read_csv('nameofdf.csv')
df

The nameofdf.csv has: enter image description here

I want to replace the numbers with strings so:

new_df = df.replace([0, 1, 2, 3, 4, up to 8 hundred thousand], ['string1', 'string2', 'string3', 'string4', 'string5', up to string8hundred thousand])
new_df

After that, jupyter shows syntaxis error and I think it is because of the large lists...

Any idea about this or how to replace the csv properly?

Thanks!

As per the linked duplicate, I recommend you convert your 2 lists to a dictionary, e.g. `d = dict(zip(list1, list2))`, and use `pd.Series.map`. If the list does not have full coverage, you can use `map` followed by `fillna`, which is still likely to be much faster than `replace`. — jpp, May 19 '18 at 13:54
Thanks @jpp I am creating the lists but I have a problem with `list2 = ['string1', 'string2', 'string3', 'string4', 'string5', up to string8hundred thousand]`. Jupyter shows a syntaxis error and I do not why... (some strings appears in black colour). Thanks! — Javulja, May 20 '18 at 10:12
Looks like you have a different problem. Unfortunately, there's not enough information yet to resolve the issue, but feel free to ask a separate question with a [mcve] and we can look to help. — jpp, May 20 '18 at 10:29

Replacing values with Pandas Dataframes - Massive values

0 Answers0